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Preface 


Sensitivity analysis addresses one of the most persistent of all questions: what would 
happen if? Within the field of demography, sensitivity analysis might be said to 
have originated with the groundbreaking, yet very different, papers of Hamilton 
(1966) and Keyfitz (1971). Hamilton calculated the sensitivity of the intrinsic rate 
of increase, r, to changes in age-specific mortality. He interpreted r as a measure of 
individual fitness, capturing the effects of the phenotype on mortality and fertility. 
The resulting sensitivities are measures of the strength of natural selection on 
aging and senescence. Keyfitz calculated sensitivities of population growth rate, life 
expectancy, and other quantities. Taking a demographic perspective, he interpreted 
the results as showing the linkage between age-specific rates at the individual level 
and the “intrinsic” rates expressed at the population level. Both these perspectives 
on sensitivity analysis continue to play major roles in demography and population 
biology. Connecting traits to individual rates, and those rates to measures of fitness, 
is the foundation of evolutionary demography. Understanding linkages between 
individual rates and population outcomes informs population projections, policy and 
spending, conservation, health demography, ecotoxicology, and so on. 

Fast forward to today. The diversity of demographic models, of the outcomes 
that can be calculated, and the power of the mathematical tools available to analyze 
them far exceed those of 50 years ago. Much of this progress is due to the 
formulation of demographic models in terms of matrices. P. H. Leslie formulated 
matrix models in the 1940s (Leslie 1945), but they were mostly ignored for 
two decades until revitalized by a series of studies in the 1960s (Keyfitz 1964; 
Lefkovitch 1965; Rogers 1968). In the very first issue of the first volume of the 
new journal Demography, Nathan Keyfitz described population projection as a 
matrix operator (Keyfitz 1964). This book relies on matrix formulations generalized 
beyond projections to age-structured and stage-structured populations, to linear and 
nonlinear dynamics, to time-invariant and time-varying vital rates, and to multistate 
models that combine age and stage information. 
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The matrix formulation provides easily computable outcomes at the level of 
the individual (e.g., risks of mortality, longevity, lifetime reproduction), the cohort 
(e.g., distributions of age or stage at death), and the population (e.g., population 
growth rate). The mathematical connection of matrix models and the theory of finite- 
state Markov chains make it possible to go beyond expected outcomes to calculate 
variances and higher moments and to take full advantage of the stochasticity of 
demographic events at the individual level (individual stochasticity). 

The sensitivity analysis of these diverse outcomes is made possible by the 
even more recently developed mathematical tool of matrix calculus (Magnus and 
Neudecker 1988). Matrix calculus permits easy differentiation of scalar-, vector-, 
and matrix-valued functions of scalar-, vector-, and matrix-valued arguments. This 
entire book is an application of these methods to demographic problems. 


Organization The book is (imperfectly) divided into five parts. Part I contains an 
introduction and a summary of the matrix calculus methods that are used throughout 
the book. 

Part II analyzes linear models for population growth, longevity, and reproduction. 
In linear models, the per-capita vital rates are independent of population size and 
structure. When the rates are also time-invariant, these models lead to a stable 
age or stage structure and exponential growth. The rate of growth is one of 
the most fundamental outcomes of stable population theory. Chapter 3 analyzes 
the sensitivity of population growth rate from three directions: differentiation of 
the characteristic equation, eigenvalue perturbation theory, and matrix calculus, 
providing the first application of the methods that form the basis of the subsequent 
chapters. Chapter 4 focuses on longevity, presenting the sensitivity analysis of 
life expectancy, variance in longevity, and life disparity. Chapter 5 introduces the 
important concept of individual stochasticity (stochastic outcomes of probabilistic 
transitions in the life cycle) and explores its effects on longevity, net reproductive 
rate, birth intervals, and age at reproduction. Some aspects of time variation 
are introduced, including the first appearance in the book of the powerful vec- 
permutation matrix method to describe temporally varying environments. 

A critical first step in the construction of any demographic model is the choice 
of the individual state (i-state) variables that capture the relevant information about 
individuals. Age, developmental stage, body size, and a variety of other properties 
have been used as i-states. However, it is often the case that a combination of age and 
some other characteristic is necessary to describe individuals. Chapter 6 presents the 
sensitivity analysis of such models, using the vec-permutation method to construct 
multistate models and matrix calculus to differentiate the results. 

Part III relaxes the assumption of time invariance. Chapter 7 presents the 
sensitivity analysis of transient dynamics, i.e., dynamics that happen in the short 
term, before asymptotic behavior appears. Short-term population growth and struc- 
ture may differ in important ways from the growth and structure implied by 
stable population theory. Chapter 7 explores these differences, for cases where 
the vital rates may be fixed, varying, or even nonlinear. Chapter 8 analyzes 
periodic models. Such models appear in a variety of guises: as matrix products 
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describing periodic (e.g., seasonal) environmental variation and as matrix products 
describing distinct processes embedded within an apparently single projection 
matrix and in the construction of multistate matrix models. In each case, the goal 
is to describe the sensitivity of some overall outcome, calculated from the entire 
periodic matrix product, to changes in parameters affecting each component of the 
matrix. Chapter 9 analyzes population growth in stochastic environments and the 
problem of decomposing differences in stochastic growth rates into components 
due to the environment and to the vital rates. This requires a combination of the 
first-order approximate decomposition known as life table response experiment 
(LTRE) analysis with the more specialized Kitagawa-Keyfitz decomposition and 
has potential implications far beyond the stochastic environment case. 

Part IV analyzes nonlinear models, including density-dependent models, 
frequency-dependent models (e.g., models for the interaction of the sexes), 
nonlinear models for subsidized populations, and a nonlinear approach to the 
sensitivity of the stable structure and the reproductive value of linear models. 

Finally, Part V returns to the analysis of the Markov chain models that form the 
basis of many of the demographic calculations throughout the book. These chapters 
take a more mathematical approach to the sensitivity analysis of Markov chains, 
including some aspects that have yet to find wide demographic application (but the 
potential is there). Chapter 11 analyzes discrete-time chains, both the absorbing 
chains familiar in demography (death is an absorbing state in most models) and 
ergodic chains that include no absorbing states. Chapter 12 presents the sensitivity 
analysis of continuous-time absorbing Markov chains, using as an example of a 
model for the stages of colorectal cancer. 

Most of the chapters here are based on, or extended from, papers that have 
appeared in a variety of journals in ecology, population biology, human demography, 
and applied mathematics. There is overlap among the chapters. This is a feature, 
not a bug, because it means that similar calculations are revisited with different 
perspectives, different derivations, and different examples. When choices arose, I 
tried to choose the presentation that would make things easier for the reader. 

The material here certainly does not exhaust the applications of matrix calculus 
in the sensitivity analysis of demographic models. I have tried to point out directions 
for further development. 
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Part I 
Introductory and Methodological 


Chapter 1 
Introduction: Sensitivity Analysis— What ss 
and Why? 


1.1 Introduction 


Demography is a science that connects individual processes and events to the 
development of cohorts and then to the dynamics of populations. It does so with 
mathematical models that distinguish among individuals based on their character- 
istics.! The most familiar such model is the life table, which records mortality and 
fertility of the individual as a function of age, and is used to calculate properties of 
cohorts (e.g., the distribution of age at death) and populations (e.g., the intrinsic rate 
of increase). 

The life table is the most familiar, but demography has proceeded far beyond 
that in both models and analyses. In any case, though, a model is defined first by its 
structure (the states of individuals and the transitions possible among them), then 
by the rates at which individuals develop, survive, and reproduce throughout the 
life cycle, then by the functional dependence of those rates (time-invariant or time- 
varying, density-independent or density-dependent, deterministic or stochastic), and 
finally by the values of the parameters that define the rates. A set of parameters 
operating within a given model generates the demographic outcomes calculated 
from the model (population growth rate, population structure, equilibria, cycles, 
measures of longevity, state occupancy times, transient behavior and projections, 
and so on). The sensitivity problem is to understand how the outcome[s] change in 
response to changes in the parameters. 


‘Technically, these characteristics are known as individual state variables, or i-states (Metz and 
Diekmann 1986; Caswell 2001). Their task is to capture all the information about the individual’s 
history that is relevant to determining its future fate, and a major task of demography is to discover 
those aspects of the individual necessary for a successful i-state (e.g., de Vries and Caswell 2017). 
In the models considered here, the population state (p-state) is a distribution function over the set of 
i-states. Thus, for example, age as an i-state leads to a population described by its age distribution. 
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Why should we care about the effects of change? 


e We may be concerned with particular changes that we have reason to believe will 
occur (due to, e.g., changes in society, changes in the environment, changes in 
policy) and want to know how they will affect the outcome of interest. 

e We may want to evaluate the effect of changes that we hope to cause as matters 
of policy, or to compare alternative policies for their effects. 

e We may be interested in evolutionary demographic questions. Natural selection 
is a process that explores the consequences for fitness (which is itself a demo- 
graphic outcome) of changing the phenotypic traits that influence demographic 
parameters. 

e We may want to identify the parameters have the biggest effect on the outcome, 
in order to allocate sampling or measurement efforts where they are most needed. 

e At a very basic level, we may simply want to know how the system works, 
how the outcomes are determined. Just as an empirical study might include 
experiments to manipulate factors and see how outcomes change, sensitivity 
analysis of a mathematical model reveals how outcomes respond to parameter 
changes. 


It is not an overstatement to say that no model is every fully understood if it does 
not include a sensitivity analysis. 


1.2 Sensitivity, Calculus, and Matrix Calculus 


The change in an outcome in response to a change in a parameter can be treated as a 
problem in differential calculus. Let £ denote some dependent variable and 6 some 
parameter. The sensitivity problem can be approached via the derivative 


dé 
— 1.1 
T0 (1.1) 
or the elasticity, or proportional sensitivity? 
0d dl 
e _ Od _ dlog i 


«6 Edd dlogé@ 


Note that I will use “sensitivity analysis” to refer generically to both sensitivity 
and elasticity. 

The sensitivity problem is a challenging task, rather than an exercise in under- 
graduate calculus, because the dependence of € on 6 may be complicated, and 


?There seems to be no standard notation for elasticities; the one I am using here is based on a 
suggestion by Samuelson (1947). 
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because € may be a scalar (e.g. life expectancy at birth, or population growth rate) 
or a vector (e.g., a stable stage distribution or a projected population structure) 
or a matrix (e.g., the matrix of mean occupancy times). Similarly, 0 may be a 
scalar (e.g., the Gompertz rate of aging) or a vector (e.g., the age schedule of 
mortality rates), or a matrix (e.g., the transition matrix among life cycle stages). 
In addition, the chains of causation in even simple demographic models are 
complicated. Tracing the causal chains from a set of parameters (of which there 
may be many) to a set of outcomes (again, many) with complicated interactions is 
hard. 

This book is an in depth exploration of sensitivity analyses based on matrix 
formulations of demographic calculations. Matrix formulations are designed pre- 
cisely to map transformations from one multidimensional space to another. Thus 
they simplify computations, clarify notation, and increase analytical power.* 

The premise of this book is that demography as a discipline is neither defined 
by, nor limited to, a taxon. You will find here examples and analyses of humans, of 
non-human animals, and plants. Human demography and population biology have 
mutually informed each other from the beginning, and I see no reason for them to 
stop now. 

It is important to remember that the diversity of complex life histories among 
the species that occupy our world poses a challenge to demographic analysis that is 
identical to the challenge posed by the complicated lives of humans. The dynamics 
of health status, family structure, or socio-economic status introduce complications 
to the life course exactly comparable to the dynamics of size growth in plants, 
metamorphosis in insects, or breeding status in birds. 


A bit of history The earliest focus of demographic sensitivity analysis was 
population growth rate à (or the intrinsic rate of increase r = logdA) in linear 
demographic models. Hamilton (1966) was the first to solve this, in the context 
of the evolution of senescence. Demetrius (1969) derived a corresponding matrix 
expression, apparently unaware of Hamilton’s results. Goodman (1971) was the 
first to notice the connection to reproductive value (see Chap. 3). Keyfitz (1971) 
derived the sensitivity of r, but also of life expectancy, mean age at death,and other 
outcomes. 

All these analyses were based on age-classified demographic models. These 
results were generalized to stage-classified models by applying eigenvalue pertur- 
bation theory (Caswell 1978), followed by elasticity calculations (de Kroon et al. 
1986), sensitivities of eigenvectors (Caswell 1982), lower-level parameters (Caswell 
1989b), second derivatives of eigenvalues (Caswell 1996), the population spreading 
rate (Neubert and Caswell 2000), transient dynamics (Caswell 2007) and other 
things. Following the important early work of Tuljapurkar (1990), the sensitivity 


3That does not mean that calculations made by other means are wrong. I am a methodological 
pluralist, and I do not believe that it is necessary to attack other methods in order to justify the use 
of matrix methods. 
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analysis of stochastic models developed in parallel with that of deterministic models 
(e.g. Tuljapurkar et al. 2003; Haridas and Tuljapurkar 2005; Horvitz et al. 2005; 
Steinsaltz et al. 2011). 

Matrix calculus, permitting differentiation of scalar-, vector-, or matrix-valued 
functions of scalar, vector, or matrix arguments, began to be developed in the 
1960s (see Nel (1980) for some history and comparison of different methods). 
The approach we will use here was introduced by Neudecker (1969) and expanded 
by Magnus and Neudecker (1985). A comprehensive, but mathematically difficult, 
treatment is given in Magnus and Neudecker (1988). Chapter 2 gives a brief 
presentation of the matrix calculus methods we will utilize in this book. 


1.3 Some Issues 


Sensitivity analysis is more than an algebraic exercise; it is a tool for making 
inferences and drawing conclusions about substantive demographic issues. It is 
useful to bring to the discussion a perspective on some questions. 


1.3.1 Prospective and Retrospective Analyses: Sensitivity and 
Decomposition 


If some variable & is a function of a set of parameters 6),..., 0p, then = gives 
the rate of change of € in response to a change in the ith parameter, holding the 
rest constant. Contrary to what is sometimes assumed, this calculation requires no 
assumption that it is actually possible to change the parameters. If the flight velocity 
of pigs is one of the parameters in the model, the analysis will happily answer the 
question of what would happen if pigs could fly. 

Nor is there any assumption that changes in 0; have ever happened in the past. 
The sensitivity analysis looks forward, asking what would happen if this or that 
parameter were to change. It is thus referred to as prospective analysis (Caswell 
2000). 

On the other hand, suppose you find yourself considering two values of &, that 
have resulted from two different situations (times, places, conditions), each with its 
own set of parameters: 


1 1 1 
at as? ... — ED 


2 2 
OF OP cs EO 


You ask, what caused the difference between €@ and £. Knowing the derivatives 


a cannot tell you, because you are not asking the counterfactual question of what 
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would happen if, but the very factual question of what actually happened between 
the two situations. This is a retrospective analysis, familiar to human demographers 
as a decomposition problem (e.g., Kitagawa 1955; Canudas Romo 2003). 

One widely used approach to understanding the causes of observed differences 
is life table response experiment (LTRE) analysis,* which uses a first-order approx- 
imation to decompose the differences, 


Ag = EM x = a£ (9 = a) l (1.3) 


The ith term in the summation is the contribution of the difference in the parameter 
6; to the difference in the outcome, A&. These contributions reflect both the 
sensitivity of £ to the parameters and the differences between conditions in each 
of the parameters. Parameters to which & is not very sensitive can make large 
contributions if the difference A0; is big enough. Contributions to which & is very 
sensitive can make small contributions if 6; does not change much. The matrix 
calculus version of this decomposition is given in Sect. 2.9, applied to differences 
in life disparity in Chap. 4, to periodic environments in Chap. 8, and explored in the 
challenging context of stochastic models in Chap. 9. 

The distinction between prospective and retrospective analysis is obvious once 
the questions they address are specified, but it has challenged a number of authors 
(e.g., Wisdom and Mills 1997; Manlik et al. 2017). A particularly insightful 
discussion of these ideas, in somewhat different terminology, appears in Nathan 
Keyfitz’s essay, How do we know the facts of demography?, which now appears as 
Chapter 20 of Keyfitz and Caswell (2005). 


1.3.2 Uncertainty Propagation 


Suppose that € is a function of 0, but 6 is known only imperfectly. Then & is 
also known only imperfectly; the uncertainty in 0 is propagated from @ to €. The 
sensitivity d£/d0 alone says nothing about uncertainty, and the uncertainty in & 
says nothing about the sensitivity. 

Uncertainty propagation can be calculated by simulation if a probability dis- 
tribution is known that can describe the uncertainty in 6. Sampling from this 
distribution and calculating € for each sampled parameter gives the distribution 
of € resulting from the uncertainty in 0 (e.g. Caswell et al. 1998; Salomon et al. 


4This awkward but well entrenched nomenclature was created when I was trying to understand the 
interpretation of experiments in ecotoxicology in which laboratory cohorts would be exposed to 
some noxious substance, and a life table (mortality and fertility schedule) measured as a response 
variable (Caswell 1989a). It soon became apparent that the method could be applied to any 
comparison of different conditions, and that the response could be any demographic variable. See 
(Caswell 2001, Chapter 10) for details. 
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2001). If the distribution of 6 comes from an empirical set of measurements, 
this approach converges to the bootstrap (Efron and Tibshirani 1993). If 6 has 
a parametric distribution (e.g., the multivariate normal distribution returned by 
maximum likelihood estimation) the technique is sometimes known as a parametric 
bootstrap (e.g., Regehr et al. 2010). 

Sensitivity analysis can contribute to uncertainty propagation analysis through 
the first order, small variance approximation to the variance in £, 


a a 
V(E) © > (=) (5) Cov(6;, 6;) (1.4) 


ij 


Notice again that sensitivity does not, by itself, say anything about uncertainty, but 
it does show how the (co)variance in parameters will propagate to the variance in 
the outcome £. 


1.3.3 Why Not Just Simulate? 


If you work on these problems, or if you apply these methods in particular studies, 
eventually you will be asked (often by a reviewer), why not just do it all by 
simulation?Just evaluate € at the value 0, and at 6 + A90, and then approximate 
the derivative as 


AE FO + Al) =F) 
A0 AO 


(1.5) 


for some very small value of A0. 

Three answers come to mind. First, if 0 and the model are of sufficiently high 
dimension, there can be a lot of these perturbations to be calculated. For example, 
population projections of the type analyzed by Caswell and Sanchez Gassen (2015), 
with 102 ages, 2 sexes, 3 vital rates, and projections on the order of 50 years, 
have over 30,000 parameters. A numerical perturbation of each of these would be 
painful. 

Second, the computation of derivatives by numerical perturbations is a noto- 
riously ill-behaved problem. A standard reference on computations in applied 
mathematics says that this approximation is “almost guaranteed to produce inac- 
curate results” (Press et al. 1992, p. 185). It is subject to truncation error (caused 
by making the perturbation too large) and roundoff error (caused by making the 
perturbation too small). In some applications these errors will be unimportant, but 
in others they can be crucial (e.g., Hunter and Caswell 2009, for an example in 
mark-recapture analysis)s. 

Third, and more basic and telling: an exact answer is always an improvement 
over an approximation. When an exact answer is available, in an easily computable 
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form, there must be strong arguments to support the idea that a less efficient and 
less accurate approximation is just as good. And having both exact and approximate 
methods is even better. 

These arguments apply to numerical calculation of derivatives. But simulation 
has an important place in analyzing scenarios; i.e., the results of specified collec- 
tions of parameters, usually with multiple and large differences among them. When 
population projections are reported with “high,” “medium,” and “low” fertility 
scenarios, the point is to compare a range of multivariate alternatives. Other 
examples include comparisons of screening procedures for colorectal cancer (Wu 
et al. 2006), or projections based on IPCC global climate models (e.g., Jenouvrier 
et al. 2012). In principle, sensitivity analysis could support these calculations 
by suggesting interesting scenarios, highlighting the parameters with the biggest 
impact on the outcome. 


1.3.4 Sensitivity and Identifying Targets for Intervention 


To intervene is to change something. Population biologists concerned with endan- 
gered species would like to intervene to increase the population growth rate. Those 
concerned with invasive pests would like to do the same, but in the opposite 
direction. Human demographers focused on aging societies wonder about how 
policies would change age distributions or dependency ratios. In all these cases, 
the interventions operate through changes in demographic parameters, and thus 
sensitivity analysis can reveal something about their effects. 

This logic has led to the use of prospective perturbation analyses in conservation 
biology, using the sensitivity or elasticity of population growth rate to identify 
promising targets for intervention. The first such use involved the loggerhead sea 
turtle (Crouse et al. 1987). Standard practice at the time was to focus on protecting 
eggs and hatchlings on nesting beaches. But a sensitivity analysis showed that 
population growth rate was not very sensitive to these stages, and much more 
sensitive to changes in survival of adults at sea. This led to a recommendation, and 
then a policy, to install “turtle excluder devices” on the nets used in coastal shrimp 
fisheries in the United States, to reduce mortality due to adult turtles being captured 
in those nets. 

This basic idea has become a part of the toolkit for conservation biology, but 
has also fallen victim to a kind of magical thinking that first makes unrealistic 
expectations of the sensitivity analysis and then blames the analysis for failing 
to meet those expectations. For a recent example see Manlik et al. (2017); for a 
thorough description of the issues and some of their solutions, see Caswell (2001, 
Chapter 18). 

The fact remains that knowing the sensitivity of some outcome &€ to some 
parameter 0 gives the rate of change of £ in response to an intervention that changes 
0. That is valuable information to have in considering the various interventions that 
might bring about a desired change. 
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1.3.5 The Dream of Easy Interpretation 


This book is full of long and complicated formulas. Occasionally, these formulas 
yield easy, readily apparent, qualitative interpretations." But not often. There is a 
reason for this. The formulas are complicated because the processes are compli- 
cated, and because the results are given at a high level of generality. Chapter 10, 
for example, analyzes the sensitivity of nonlinear, density-dependent models. It 
derives a complicated formula for the sensitivity of any function of the equilibrium 
population, to changes in any parameter affecting any of the vital rates, in any age- 
or stage-specific way, for any choice of stage classification and any survival, fertility, 
and transition rates, with any pattern of density dependence, for any species with any 
kind of life history. Accounting for that web of dependencies, in such generality, 
makes finding an easily interpretable formula an unlikely dream. 

Not an impossible dream, but in general, insights of that kind arise from 
simplifying general methods to address particular situations. Specifying a particular 
demographic structure, choosing an outcome variable of interest, and carefully 
specifying the functional dependencies, if done skillfully, can lead to qualitative 
results. 


1.4 The Importance of Change 


Questions of change lurk in almost every demographic (every scientific?) study. We 
ask how things have changed in the past, how they differ among populations in the 
present, and how they will, or may, change in the future. Even apparently simple 
descriptive statements (the results of a census in a particular time and place, for 
example) are almost immediately examined in comparison with other times and/or 
places. 

Sensitivity analysis is a powerful tool for analyzing change, in the special case of 
demographic outcomes that are calculated as functions of some set of parameters. 
As the chapters to come will make clear, this covers a wide landscape of interesting 
demographic questions. And the list is not yet complete. 
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Chapter 2 A 
Matrix Calculus and Notation PEEM 


2.1 Introduction: Can It Possibly Be That Simple? 


In October of 2005, I scribbled in a notebook, “can it possibly be that simple?” I 
was referring to the sensitivity of transient dynamics (the eventual results appear 
in Chap. 7), and had just begun to use matrix calculus as a tool. The answer to my 
question was yes. It can be that simple. 

This book relies on this set of mathematical techniques. This chapter introduces 
the basics, which will be used throughout the text. For more information, I 
recommend four sources in particular. The most complete treatment, but not the 
easiest starting point, is the book by Magnus and Neudecker (1988). More accessible 
introductions can be found in the paper by Magnus and Neudecker (1985) and 
especially the text by Abadir and Magnus (2005). A review paper by Nel (1980) 
is helpful in placing the Magnus-Neudecker formulation in the context of other 
attempts at a calculus of matrices. 

Sensitivity analysis asks how much change in an outcome variable y is caused 
by a change in some parameter x. At its most basic level, and with some reasonable 
assumptions about the continuity and differentiability of the functional relationships 
involved, the solution is given by differential calculus. If y is a function of x, then 
the derivative 


dy 
dx 


tells how y responds to a change in x, i.e., the sensitivity of y to a change in x. 
However, the outcomes of a demographic calculation may be scalar-valued (e.g., 

the population growth rate A), vector-valued (e.g., the stable stage distribution), 

or matrix-valued (e.g., the fundamental matrix). Any of these outcomes may be 
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functions of scalar-valued parameters (e.g., the Gompertz aging rate), vector-valued 
parameters (e.g., the mortality schedule), or matrix-valued parameters (e.g., the 
transition matrix) parameters. Thus, sensitivity analysis in demography requires 
more than the simple derivative in (2.1). We want a consistent and flexible approach 
to differentiating 


scalar-valued scalar 
vector-valued ¢ functions of } vector } arguments 
matrix-valued matrix 


2.2 Notation and Matrix Operations 


2.2.1 Notation 


Matrices are denoted by upper case bold symbols (e.g., A), vectors (usually) by 
lower case bold symbols (n). The (i, j) entry of the matrix A is a;;, and the ith 
entry of the vector n is n;. Sometimes we will use MATLAB notation, and write 


X(i, :) = row i of X (2.1) 
X(:, j) = column j of X (2.2) 
The notation 
(xGi)) 


denotes a matrix whose (i, j) entry is x. For example, 


dy; 

( dxj ) 
is the matrix whose (7, j) entry is the derivative of y; with respect to xj. 

The transpose of X is x, Logarithms are natural logarithms. The vector norm 
||x|| is, unless noted otherwise, the 1-norm. The symbol D (x) denotes the square 
matrix with x on the diagonal and zeros elsewhere. The symbol 1 denotes a vector 
of ones. The vector e; is a unit vector with | in the ith entry and zeros elsewhere. 
The identity matrix is I. Where necessary for clarity, the dimension of matrices or 
vectors will be indicated by a subscript. Thus I, is a s x s identity matrix, 1, is an 
s x 1 vector of ones, and X;,x,, 1S am x n matrix. 

In some places (Chaps. 6 and 10) block-structured matrices appear; these are 
denoted by either A or A, depending on the context and the role of the matrix. 
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2.2.2 Operations 


In addition to the familiar matrix product AB, we will also use the Hadamard, or 
elementwise product 


A oB= (aijbij) (2.3) 
and the Kronecker product 

A @B= (a;jB) (2.4) 
The Hadamard product requires that A and B be the same size. The Kronecker 


product is defined for any sizes of A and B. Some useful properties of the Kronecker 
product include 


(A&B)! = (a~! @B-') (2.5) 
(A@B)! = (aT 2 BT) (2.6) 
AQ&(B+C =(A8B)+480) (2.7) 


and, provided that the matrices are of the right size for the products to be defined, 


(A; ® B1) (A2 Q Bz) = (Aj Az @ B1B2). (2.8) 


2.2.3. The Vec Operator and Vec-Permutation Matrix 


The vec operator transforms a m x n matrix A into a mn x 1 vector by stacking the 
columns one above the next, 


AG, 1) 
vecA = : (2.9) 
A(:, n) 


For example, 


(2.10) 


< 
o 

Ò 

ATN 
a 8 
a cœ 
wi 
| 
aSa 8 
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The vec of A and the vec of A! are rearrangements of the same entries; they are 
related by 


vec AT = Km, nvec A (2.11) 


where A is m x n and Km,n is the vec-permutation matrix (Henderson and Searle 
1981) or commutation matrix (Magnus and Neudecker 1979). The vec-permutation 
matrix can be calculated as 


m n 


Knn =} (Ej 8E} ) (2.12) 


i=l j=1 


where E;; is a matrix, of dimension m x n, with a 1 in the (, j) entry and zeros 
elsewhere. Like any permutation matrix, K7! = K]. 

The vec operator and the vec-permutation matrix are particularly important in 
multistate models (e.g., age x stage-classified models), where they are used in both 
the formulation and analysis of the models (e.g., Caswell 2012, 2014; Caswell and 
Salguero-Gomez 2013; Caswell et al. 2018); see also Chap. 6. Extensions to an 
arbitrary number of dimensions, so-called hyperstate models, have been presented 
by Roth and Caswell (2016). 


2.2.4 Roth’s Theorem 


The vec operator and the Kronecker product are connected by a theorem due to Roth 
(1934): 


vec (ABC) = (c 2 A) vec B. (2.13) 


We will often want to obtain the vec of a matrix that appears in the middle of a 
product; we will use Roth’s theorem repeatedly. 


2.3 Defining Matrix Derivatives 


The derivative of a scalar y with respect to a scalar x is familiar. What, however, 
does it mean to speak of the derivative of a scalar with respect to a vector, or of 
a vector with respect to another vector, or any other combination? These can be 
defined in more than one way and the choice is critical (Nel 1980; Magnus and 
Neudecker 1985). This book relies on the notation due to Magnus and Neudecker, 
because it makes certain operations possible and consistent. 


e Ifx and y are scalars, the derivative of y with respect to x is the familiar derivative 
dy/dx. 
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e Ifyisan x 1 vector and x a scalar, the derivative of y with respect to x is the 
n x l column vector 


dyi 

dx 
d 
= o. (2.14) 
x : 

dyn 

dx 


e If yis a scalar and x am x 1 vector, the derivative of y with respect to x is the 
1 x m row vector (called the gradient vector) 


dy (2 rape! J (2.15) 


dx" \ dx OXm 
Note the orientation of dy/dx as a column vector and dy/ dx! as a row vector. 


e Ifyisan x 1 vector and x am x 1 vector, the derivative of y with respect to x 
is defined to be the n x m matrix whose (i, j) entry is the derivative of y; with 


respect to xj, i.e., 
d dy; 
y =( Ji ) (2.16) 


dx! dxj 


(this matrix is called the Jacobian matrix). 

e Derivatives involving matrices are written by first transforming the matrices 
into vectors using the vec operator, and then applying the rules for vector 
differentiation to the resulting vectors. Thus, the derivative of the m x n matrix 
Y with respect to the p x q matrix X is the mn x pq matrix 


dvec Y 


—. (2.17) 
d (vec x)! 


From now on, I will write vec TX for (vec x)!. 


2.4 The Chain Rule 


The chain rule for differentiation is your friend. The Magnus-Neudecker notation, 
unlike some alternatives, extends the familiar scalar chain rule to derivatives of 
vectors and matrices (Nel 1980; Magnus and Neudecker 1985). If u (size m x 1) is 
a function of v (size n x 1) and v is a function of x (size p x 1), then 


du du dv 
dx! = (=) (5) a) 
Qe 


mxp mxn nxp 
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Notice that the dimensions are correct, and that the order of the multiplication 
matters. Checking dimensional consistency in this way is a useful way to find errors. 


2.5 Derivatives from Differentials 


The key to the matrix calculus of Magnus and Neudecker (1988) is the relationship 
between the differential and the derivative of a function. Experience suggests that, 
for many readers of this book, this relationship is shrouded in the mists of long-ago 
calculus classes. 


2.5.1 Differentials of Scalar Function 


Start with scalars. Suppose that y = f(x) is a differentiable function at x = xo. 
Then the derivative of y with respect to x at the point xo is defined as 


f (xo +h) — fo) 
; ; 


f' (xo) = lim (2.19) 
h>0 

Now define the differential of y. This is customarily denoted dy, but for the moment, 

I will denote it by cy. The differential of y at xo is a function of h, defined by 


cy(xo, h) = f'(xo)h. (2.20) 


There is no requirement that h be “small.” Since x is a function of itself, x = g(x), 
with g'(x) = 1, we also have cx(xo,h) = g’(xo)h = h. Thus the ratio of the 
differential of y and the differential of x is 


eyo, h) _ f'Ooyh _ 


ce ae f (Xo). (2.21) 


That is, the derivative is equal to the ratio of the differentials. 
Now, return to the standard notation of dy for the differential of y. This gives 
two meanings to the familiar notation for derivatives, 


d 
Z| = f'o). (2.22) 


dx Pa 


The left hand side can be regarded either as equivalent to the limit (2.19) or the ratio 
of the differentials given by (2.21). Mathematicians are strangely unconcerned with 
this ambiguity (e.g., Hardy 1952). 
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All this leads to a set of familiar rules for calculating differentials that guarantee 
that they can be used to create derivatives. A few of these, for scalars, are 


d(u + v) = du + dv (2.23) 
d(cu) = c du (2.24) 
d(uv) = u(dv) + (du)v (2.25) 
d(e") = e"du (2.26) 

d(logu) = Iju (2.27) 
u 


If y = f (x1, x2), then the total differential is 


0 0 
w de (2.28) 
Ox] 0x2 


dy 
Derivatives can be constructed from these expressions at will by dividing by 
differentials. For example, dividing (2.23) by dx gives d(u + v)/dx = du/dx + 
dv/dx. From (2.28), we have 


dy _ af | af dey 


= 2.29 
dx, Ox] 0x2 dx, ( ) 


dy _ of dx, of 


= ; (2.30) 
dx2 0x, dx2 9 0x2 


2.5.2 Differentials of Vectors and Matrices 


To extend these concepts to matrices, we define the differential of a matrix (or 
vector) as the matrix (or vector) of differentials of the elements; i.e., 


dX = (dxj;). (2.31) 


This definition leads to some basic rules for differentials of matrices: 


d(cU) = c(dU) (2.32) 
d(U + V) = dU + dV (2.33) 
d(UV) = (dU)V + U(d V) (2.34) 


d(U & V) = (dU) 8 V +U & (dV) (2.35) 
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d(U o V) = (dU) o V+ Uo (dV) (2.36) 
dvec U = vec dU (2.37) 


where c is a constant, and, of course, the dimensions of U and V must be 
conformable. The differentials of an operators applied elementwise to a vector can 
be obtained from the differentials of the elements. For example, suppose u is a s x 1 
vector, and the exponential is applied elementwise. Then 


e"! dui 

d(exp(u)) = : (2.38) 
es dus 

=D [ exp(u) | du. (2.39) 


If y is a function of x; and x3, the total differential is given just as in (2.28), by 


oy oy 
dy = —=dx, + —=dx 2.40 
y axl 1 ax! 2 (2.40) 


2.6 The First Identification Theorem 


For scalar y and x, 


d 
dy = qdr => Se. (2.41) 
dx 


That much is easy. But, suppose that y is a n x 1 vector function of the m x 1 vector 
x. The differential dy is the n x 1 vector 


dy, 
dy = (2.42) 
dyn 
which, by the total derivative rule, is 
ð 0 
mere +--+ I Ain 
Ox] OXm 
dy = : (2.43) 
0 ð 
Py ta sea M dm 


Ox] OXm 
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dy... Oy 


Ox} ` OXm dx} 

es) : (2.44) 
9Yn 9Yn 
Ox E OXm dXm 

= Q dx. (2.45) 


If these were scalars, dividing both sides by dx would give Q as the derivative 
of y with respect to x. But, one cannot divide by a vector. Instead, Magnus and 
Neudecker proved that if it can be shown that 


dy = Q dx (2.46) 
then the derivative is 
dy 
— =Q. 2.47 
T= (2.47) 


This is the First Identification Theorem of Magnus and Neudecker (1988).! 


2.6.1 The Chain Rule and the First Identification Theorem 


Suppose that dy is given by (2.46), and that x is in turn a function of some vector 6. 
Then 


dx = —=d0 (2.48) 
do! 
and 
dy dx 
— = Q—. 2.49 
do! do" en 


In other words, the differential expression (2.46) can be transformed into a derivative 
with respect to any vector by careful use of the chain rule. This applies equally to 
more complicated expressions for the differential. Suppose that 


dy = Qdx + Raz. (2.50) 


'There is also a second identification theorem that provides the second derivatives of matrix 
functions. See Shyu and Caswell (2014) for applications of this theory to the second derivatives of 
measures of population growth rate. 
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Applying the chain rule to the differentials on the right hand side gives 


dx dz 
dy = Q—=d0 + R— d0 2.51 
vie do! aie 
for any vector 0. Thus 
dx dz 
dy = | Q— +R) dé, (2.52) 
i ( do’ ao" 


and the First Identification Theorem gives 


dy - (02 +R) (2.53) 
do" \~ae' d0!) 


2.7 Elasticity 


When parameters are measured on different scales, it is sometimes helpful to 
calculate proportional effects of proportional perturbations, also called elasticities. 
The elasticity of y; to 0; is 


eye _ 0j dyi 


= : (2.54) 
€0; Vi d6; 
For vectors y and 0, this becomes 
Y= py! A D (0). (2.55) 
€0 dé 


There seems to be no accepted notation for elasticities; the notation used here is 
adapted from that in Samuelson (1947). 


2.8 Some Useful Matrix Calculus Results 


Several matrix calculus results will be used repeatedly. Many more can be found in 
Magnus and Neudecker (1988) and Abadir and Magnus (2005). 


1. The matrix product Y = AB. Differentiate, 


dY = (dA)B + ACB). (2.56) 
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Then write (or imagine writing; with practice one does not actually need this step 
explicitly) 


(dA) B = I (dA) B (2.57) 
A (dB) = A (dB)I (2.58) 


and apply the vec operator and Roth’s theorem, to obtain 
dvec Y = (BT 2 1) dvec A + (I Q A) dvec B. (2.59) 


The chain rule gives, for any vector variable 0 


a = (BT @ 1) a +(1@A) — (2.60) 
2. The Hadamard product Y = A o B. Differentiate the product, 
dY = dA o B + A o dB, (2.61) 
then vec 
dvec Y = dvec A o vec B + vec A o dvec B. (2.62) 


It will be useful to replace the Hadamard products, which we do using the fact 
that x o y = D (x)y, to get 


dvec Y = D (vec B)dvec A + D (vec A)dvec B. (2.63) 


The chain rule gives the derivative from the differential, 


dvec Y dvecA 
= D (vec B) + D (vec A) 
do" do" 


dvec B 
do! © 


(2.64) 


3. Diagonal matrices. The diagonal matrix D (x), with the vector x on the diagonal 
and zeros elsewhere, can be written 


D(x) =Io (1 x") (2.65) 
Differentiate both sides, 


dD (x) =Io (1 dx") (2.66) 
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and vec the result 


dvec D (x) = D (vec Dvec (1 dx") (2.67) 
= D (vec I) (I @ 1) dx (2.68) 


The First Identification Theorem gives 


dvec D (x) 


dx 

mar D (vecD (I & 1) a (2.69) 

The identity matrix in (2.65) masks the matrix (1 x!), setting to zero all but 
the diagonal elements. Matrices other than I can be used in this way to mask 
entries of a matrix. For example, the transition matrix for a Leslie matrix, with a 
vector of survival probabilities p on the subdiagonal, is obtained by setting x = p 
and replacing I with a matrix Z that contains ones on the subdiagonal and zeros 
elsewhere (see, e.g., Chap. 4). 

Some Markov chain calculations (Chaps.5 and 11) involve a matrix Nag, 
which contains the diagonal elements of N on the diagonal and zeros elsewhere. 
This can be written 


Nag = I o N. (2.70) 
Differentiating and applying the vec operator yields 
dvec Nag = D (vec I)dvecN. (2.71) 
. The Kronecker product. Differentiating the Kronecker product is a bit more 
complicated (Magnus and Neudecker 1985, Theorem 11). We want an expression 
for the differential of the product in terms of the differentials of the components, 
something of the form 
dvec (A ® B) = Zıdvec A + Zodvec B (2.72) 
for some matrices Z; and Zp. 
This requires a result for the vec of the Kronecker product. Let A be of 
dimension m x p and B ber x s. Then 
vec (A ® B) = (I, ® Ks 7 ® I,) (vec A @ vec B). (2.73) 
Let Y = A @B. Differentiate, 


dY = (dA@B) + (A@aB) (2.74) 
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and vec 


dvec Y = (I, ® Ks nm © I,) | (avec ® vec B) + (vec A Q dvec B) |. 


(2.75) 
With some ingenious simplifications (Magnus and Neudecker 1985), this reduces 
to (2.72) with 


Zi = (Ip ® Kym Q I) (Imn © vec B) (2.76) 
Z2 = (I, ® Ks nm Q I,) (vecA @ Is). (2.77) 
Substituting Z; and Z2 into (2.72) gives the differential of the Kronecker product 


in terms of the differentials of its component matrices. 
5. The matrix inverse. The inverse of X satisfies 


xXx! =I. (2.78) 
Differentiate both sides 
(dX)X-!4+X (ax"') =0, (2.79) 
then vec 
iy) 1 
[es ) @ 1 dvec X + [I 9 X] dvec X7! = 0 (2.80) 
and finally solve for dvec X~! 
1 1 i\! 
dvecX ~~ = —[I@ X]~ [0 ) ® i dvec X (2.81) 


The properties (2.5) and (2.8) of the Kronecker product let this be simplified to 
1 pT 1 
dvecX7! = — I(x" ) 9X- | dvec X (2.82) 


6. The square root and ratios. In calculating standard deviations and coefficients of 
variation it is useful to calculate the elementwise square root and the elementwise 
ratio of two vectors. If x is a non-negative vector, and the square root ./x is taken 
elementwise, then 


dJx = 5D (Vx) | dx. (2.83) 


26 2 Matrix Calculus and Notation 


For the elementwise ratio, let x and y be m x 1 vectors, with y nonzero. Let w be 
a vector whose ith element is x;/yj; i.e., w = D (y)~!x. Then 


dw = D(y)ldx — [XTD 0)! @ Dy] D (vec In) in ® In) dy. 
(2.84) 


This list could go on. The books by Magnus and Neudecker (1988) and Abadir 
and Magnus (2005) contain many other results, and demographically relevant 
derivations appear throughout this book, especially in Chap. 5. 


2.9 LTRE Decomposition of Demographic Differences 


The LTRE decomposition in Sect. 1.3.1 extends readily to matrix calculus. Suppose 
that a demographic outcome &, dimension (s x 1), is a function of a vector 0 
of parameters, dimension (p x 1). Suppose that results are obtained under two 
“conditions,” with parameters 0“) and @°). Define the parameter difference as 
A0 = 6 — 6 and the effect as A£ = £® — &), Then, to first order, 


Aé & — Ad; 23 
£ 2 a (2.85) 
dé 
= — A0. 2.86 
uT (2.86) 
Writing 
AO = D(AO)I1p, (2.87) 


we create a contribution matrix C, of dimension s x p, 


dé 
C = — D (A9). (2.88) 
do" 
The (i, j) entry of C is the contribution of A9; to the difference ;, fori = 1,...,s 
and j = 1,..., p. The rows and columns of C give 
C(i, :) = contributions of A@ to Aé; (2.89) 


CG, j) = contributions of 6; to Aé (2.90) 
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When calculating C, the derivative of € must be evaluated somewhere. Experi- 
ence suggests that the evaluating it at the midpoint between 0°” and 0® gives good 
results (Logofet and Lesnaya 1997; Caswell 2001). 


2.10 A Protocol for Sensitivity Analysis 


The calculations may grow to be complex, but the protocol is simple: 


. write a matrix expression for the outcome, 

. differentiate, 

vec, 

. simplify, 

. calculate derivatives from the differentials, and 
. extend using the chain rule 


The rest of this book shows what can be done with this simple procedure. 
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Part II 
Linear Models 


Chapter 3 A 
The Sensitivity of Population Growth TRICA 
Rate: Three Approaches 


3.1 Introduction 


The essence of stable population theory is the fact that a population subject to time- 
invariant vital rates will (with a few exceptions not of interest here) converge to 
a stable structure and grow exponentially at a constant rate (the population growth 
rate, or intrinsic rate of increase). The calculation of the population growth rate from 
the vital rates is one of the most important accomplishments of formal demography 
(Sharpe and Lotka 1911).! Ecologists recognized early on that, by integrating 
survival and fertility over the life course, the population growth rate provided a 
powerful tool for describing the population consequences of environmental condi- 
tions (e.g., Birch 1953). For the same reason, evolutionary biologists recognized 
it as a measure of fitness (Fisher 1930), although that concept requires careful 
consideration of both demographic and genetic processes (Charlesworth 1994; de 
Vries and Caswell 2018). 

This makes the sensitivity analysis of population growth rate an important 
problem. It has been approached in three ways. The earliest approach (Hamilton 
1966) is specific to age-classified models, and relies on differentiation of the 
characteristic equation. The second (Caswell 1978) applies to stage-classified as 
well as age-classified models, and uses eigenvalue perturbation theory. The third is 
based on matrix calculus and is more flexible than its predecessors. 


Chapter 3 is modified, under the terms of a Creative Commons Attribution License, from Caswell, 
H. 2010. Reproductive value, the stable stage distribution, and the sensitivity of the population 
growth rate to changes in vital rates. Demographic Research 23:531-548, ©Hal Caswell. 
‘Leonard Euler had obtained the result in 1760, but his derivation rediscovered until 1970 (Keyfitz 
and Keyfitz 1970). 
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32 3 The Sensitivity of Population Growth Rate: Three Approaches 
3.2 Hamilton’s Equation for Age-Classified Populations 


Consider an age-classified model, in which age x is a continuous variable, with 
mortality rate u(x) and maternity function m(x). The survivorship function is 


(x) = exp (- f mada) (3.1) 
0 


and the population growth rate r is the solution to the Euler-Lotka equation 


l= i e '“€(a)m(a)da. (3.2) 
0 


The stable age distribution, reproductive value function, birth rate, and generation 
time (mean age of reproduction in the stable population) are given by 


e '*&(x) 


c(x) = == stable age distribution 3.3 
Oe Saas g 6.3) 
e”*¥ [0,6] 
v(x) = — | e '“€(a)m(a)da reproductive value (3.4) 
L(x) Jx 
lee) -1 
b= | / eeada| birth rate (3.5) 
0 
_ Co 
a= | ae '“£(a)m(a)da generation time (3.6) 
0 


Sensitivity of r Hamilton (1966) derived the sensitivities of r to changes in 
mortality and fertility at a specified age x. His results are equivalent to 


dr —c(x)v(x) 
= = 3.7 
d(x) bA va 
r ge (3.8) 
dm(x) bA 


That is, the sensitivity of r to a change in mortality at age x is proportional to the 
product of the reproductive value at age x and the abundance of age x in the stable 
age distribution. The sensitivity of r to a change in fertility at age x is proportional 
to the stable age distribution (and the reproductive value at age 0, which equals 1). 
The proportionality constant in each case is the inverse of the product of the birth 
rate and the mean age of reproduction. 
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Derivation Hamilton’s results are obtained by implicit differentiation of the Euler- 
Lotka equation (3.2). We will derive Hamilton’s original formulation and then 
show how it reduces to the relation between the stable age distribution and the 
reproductive value distribution in (3.7) and (3.8). 

First, introduce a perturbation parameter 0 to measure the change in mortality or 
fertility at the specified age. Writing survival, fertility, and r as functions of 0 gives 
the Euler-Lotka equation 


CO 
l= I e' 266, a)m(0, a) da. (3.9) 
0 


Differentiating both sides of (3.9) with respect to 0 gives 


dr(6 pS 
05-2 : ) f aerO, a)m(6, a) da 
0 


a dto, 
+f era OO m,a) da 


oo dm(0 
+f e Oa, a CA ga, (3.10) 
J do 


Solving (3.10) for dr/d@ gives 


do A 0 
mortality fertility 


dr(6) 1f (®© 4 dee, 00 dm(@, 

ee | P A, aa+ | et@agg a CA ga 
i do i d 
i aaa 


(3.11) 
Equation (3.11) has two terms, one capturing effects of 0 on mortality and the other 
capturing effects on fertility. 


3.2.1 Effects of Changes in Mortality 


We want to perturb mortality at one exact age x (remember that age and time are 
continuous), leaving mortality at all other ages unchanged. To do this, we use the 
unit impulse function, or Dirac delta function. This is a generalized function defined 
by 


d(x) =0 x0 (3.12) 


ia d(s)ds = 1. (3.13) 
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The unit impulse is used in signal processing (e.g., Kamen and Heck 1997, p. 7) 
to represent the limit of a perturbation of unit strength applied over a shorter and 
shorter time interval. Think of a normal distribution with mean O, in the limit as the 
variance goes to 0, while the area under the curve remains at 1. The most useful 
properties of the unit impulse, for our application, are 


a ô(a — x) f (a)da = f (x) (3.14) 
and 


f ô(s)ds = H (x) (3.15) 


where H (x) is the Heaviside function, or unit step function, which satisfies H (x) = 
0 for x < 0 and H (x) = 1 for x > 0. 
We write mortality as 


LO, a) = u(0,a)+0ô(a — x) (3.16) 


where ô(x) is the unit impulse function. The sensitivity of r to u(x) is obtained as 
the derivative of r with respect to 6, evaluated at 0 = 0, 


dr dr 
= — : (3.17) 
Because only mortality is affected by 0 
dm(6, 
eee (3.18) 
dé 
du(0, 
pe EE Cee (3.19) 
From (3.1), 
dlp, a g 
ce ao d(a — x)da (3.20) 
dé 0 
= —4(0,a)H (a — x). (3.21) 


Substituting into (3.11) and evaluating at 6 = 0 gives 


= -== =at (a) ada) (3.22) 
du) A f E ajma a A A 
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The integral in (3.22) is close to the reproductive value v(x) given by (3.4); 
specifically, 


f e '“£(a)m(a)da = L(x)e ™ v(x). (3.23) 


However, from (3.3) and (3.5), €(x)e~"* = c(x)/b. Making these substitutions 
into (3.22) gives the formal relationship (3.7). 


3.2.2 Effects of Changes in Fertility 


Following the same approach, if the perturbation affects fertility at exact age x, we 
write 


m(6,a) =m(0,a) + 0ô(a — x). (3.24) 


Because only fertility is affected by 0, du(0, a)/d@ = 0 and dm(6, a)/d0 = ô(a — 
x). Substituting these into (3.11) and evaluating the result at 6 = 0 gives 


d 1 
ae =5 (e-"*(a)). (3.25) 


From (3.3) and (3.5) it can be seen that the numerator is c(x)/b, which leads to the 
formal relationship (3.8). 


3.2.3 History and Perspectives 


Hamilton (1966) obtained the relationship (3.22) in his analysis of the evolution 
of senescence. From (3.22) and (3.8) it is apparent that (provided r > 0) the 
magnitudes of the sensitivities of r to mortality and fertility decline with age. These 
sensitivities measure the selection gradients on age-specific mortality and fertility. 
Thus Hamilton concluded that the strength of selection against deleterious mutations 
would necessarily decline with their age of action, that small positive effects at early 
ages could easily compensate for much larger negative effects at later ages, and that 
the evolution of senescence was therefore inevitable. 

In the years that followed Hamilton’s paper, several other authors developed 
perturbation analysis for r, using related methods. Demetrius (1969) used a discrete 
age-classified model, and Emlen (1970) used Hamilton’s results to derive the 
dynamics of gene frequencies resulting from the selection gradients on age-specific 
survival and fertility. 
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Keyfitz (1971) in a remarkable paper, used implicit differentiation to obtain the 
sensitivity of population growth rate, life expectancy, birth rates, death rates, and 
the stable age distribution, apparently independently of Hamilton. He noted the 
appearance of reproductive value in the sensitivity of r to mortality. Goodman 
(1971) was apparently the first to note that the sensitivities of r to mortality and 
fertility could be expressed in terms of the stable age distribution and reproductive 
value. 

When Hamilton’s paper appeared, it was regarded as difficult and esoteric, but 
it had a great impact. It provided the analytical machinery for examining trade-offs 
between opposing demographic traits, known as antagonistic pleiotropy (Williams 
1957; Rose 1991). It also describes the accumulation of deleterious mutations due 
to the balance between mutation and selection (e.g., Steinsaltz et al. 2005). These 
ideas are fundamental to the analysis of human aging (e.g., Rose 1991; Wachter and 
Finch 1997; Carey and Tuljapurkar 2003; Baudisch 2008) and, more generally, the 
analysis of life history evolution in humans and other species (e.g., Charlesworth 
1994; Stearns 1992). 


3.3 Stage-Classified Populations: Eigenvalue Perturbations 


Implicit in Hamilton’s analysis is the assumption that the vital rates are functions 
of age. In many cases, they are not. In humans, characteristics such as education, 
marital status, health status, or spatial location, may provide important information 
in addition to age. In other species, the vital rates may depend on developmental 
stage or body size more than on age. Such populations are described by stage- 
classified demographic models, of which the age-classified theory is a special 
case. 

Stage-classified demography can be analyzed using matrix population models 
(Leslie 1945; Caswell 2001). The discrete-time population growth rate A is the 
dominant eigenvalue of the population projection matrix A (guaranteed to be real 
and positive by the Perron-Frobenius theorem). Let n(t) be the population vector at 
time f, and A the population projection matrix, with 


n(t + 1) = An(t) (3.26) 


and the population growth rate is given by the dominant eigenvalue à of A. The 
stable stage distribution is given by the corresponding right eigenvector w and the 
reproductive value function by the left eigenvector v; they satisfy 
Aw = Aw (3.27) 
vA =A! (3.28) 
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Sensitivity of à The effects of perturbations on population growth are approached 
by looking for the sensitivity of an eigenvalue to changes in the entries of a matrix. 
We will see that the sensitivity of A to a change in the entry a;; of A is (Caswell 
1978) 


aX vw; 
= = (3.29) 
daij vw 


The entry a;; measures the per-capita production of stage i by stage j. Thus the 
effect of a change in a;; is proportional to the reproductive value of the destination 
stage and to the abundance of the origin stage in the stable population. This 
is a generalization of the relationships (3.7) and (3.8) obtained from Hamilton’s 
analysis. 


Derivation The eigenvalue À is a solution to the characteristic equation of A, which 
generalizes the Euler-Lotka equation (3.2). Except in special cases, however, the 
characteristic equation cannot be written down explicitly, making the implicit dif- 
ferentiation approach used by Hamilton impossible. Instead, the relationship (3.29) 
is obtained by a perturbation expansion. Suppose that A is perturbed to A + AA. 
This will result in perturbations of à and of w, which must satisfy 


(A + AA) (w+ Aw) = (A+ AA)(w+ Aw). (3.30) 


Expanding the products, setting second order terms to zero, and remembering that 
Aw = dw gives 


A(Aw) + (AA)w = A(Aw) + (AA)w. (3.31) 
Multiply on the left by v! and simplify to obtain 
(Aa)v'w = v'(AA)w. (3.32) 
If the perturbation affects only one entry, say a;;, of A, then 


Uj; Wj (Aai;) 


AÀ = 
viw 


(3.33) 


Dividing both sides by Aa;; and taking the limit as Aa;; — 0 gives the sensitivity 
result (3.29). 


3.3.1 Age-Classified Models as a Special Case 


To compare (3.29) with Hamilton’s results (3.7) and (3.8), consider an age-classified 
matrix (a Leslie matrix) with fertilities F; in the first row, survival probabilities P; on 
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the subdiagonal, and zeros elsewhere (Leslie 1945; Keyfitz 1968). In this case (3.29) 
simplifies to 


OA Uj 41 Wi 


= 3.34 
OP; vw ( ) 
Or VU; 

= : 3.35 
OF; vw ( ) 


Equation (3.34) corresponds to (3.7); the sensitivity is proportional to the product of 
the reproductive value and the stable stage distribution. Equation (3.35) corresponds 
to (3.8), and shows why reproductive value is apparently missing from (3.8): 
reproductive value at birth [v(0) in Hamilton’s notation] is scaled to equal 1. 


3.3.2 Sensitivity to Lower-Level Demographic Parameters 


The entries of A are often functions of other, lower-level parameters. The sensitivity 
of à to these parameters is obtained by the chain rule. For example, suppose that 
stage 1 may contribute individuals to stages 2 or 3 (Fig. 3.1). Write the transition 
probabilities as 


a2) = yo (3.36) 
azı = (l — y)o (3.37) 
where ø is the survival probability and y the probability that the individual moves 


to stage 2, conditional on survival. Then the sensitivities of À to y and to o are given 
by 


dà ðà d ðà d 
= eel y a (3.38) 
do daz, do a3, do 


wi lyv2 + (1 — y)v3] 
vw 


(3.39) 


Fig. 3.1 An example of 
lower-level parameters 
appearing in a portion of a 
life cycle. Individuals in stage 
1 survive with probability o, 
and, conditional on survival, 
move to stage 2 with 
probability y and to stage 3 
with probability 1 — y 
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dix Or d Or d 
= ila = (3.40) 
dy ða dy ðazı dy 


_ ow] (v2 — v3) 


(3.41) 

viw 
The sensitivity to survival is proportional to the weighted average of the reproduc- 
tive values of the destination stages, and the sensitivity to the transition probability 
y is proportional to the difference in reproductive value between the destination 
stages. 


3.3.3 History 


I first encountered the basis for this perturbation expansion in a paper by C.A. 
Desoer in the proceedings of an engineering conference (Desoer 1967).” Eigenvalue 
perturbations were of particular interest to engineers in the 1960s as part of a 
shift from frequency-domain methods to state-space methods in the study of linear 
systems (Zadeh and Desoer 1963). However, the result dates back to Jacobi (1846), 
and has been independently rediscovered many times (e.g., Faddeev 1959; Papoulis 
1966; Franklin 1968). In population biology, this perturbation approach has been 
extended to many other sensitivity problems, including the sensitivity of subdomi- 
nant eigenvalues and transient behavior, of growth rates in periodic and stochastic 
environments, of the eigenvectors, and of the spreading speed in biological or 
demographic invasions (see Caswell (2001) for reviews and references). 


3.4 Growth Rate Sensitivity via Matrix Calculus 


Matrix calculus provides a still more general approach to the sensitivity analysis 
of the population growth rate. Equation (3.29) perturbs only a single entry of A; 
derivatives with respect to other parameters are assembled by summing their effects 
over all the entries of A, as in (3.41). Using matrix calculus, we now consider À as 
a scalar function of A and A as a matrix-valued function of a parameter vector 0. 


Sensitivity of à We will show that the derivative of à with respect to @ is 
did wi @v! dvecA 
z= = ( T ) ; (3.42) 
d0 vw dd 


By a fortunate accident; I was searching for something completely different. We may wonder 
whether the chances of such coincidences are higher or lower in the internet search era. 
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where ® denotes the Kronecker product. If 0 is a p x 1 vector of parameters, then 
da/do" isa 1 x p matrix whose ith entry is dA /d6;. 


Derivation Following the steps in Chap. 2, begin by taking the differential of both 
sides of (3.27) to give 


(dA)w + A(dw) = (dà)w + A(dw). (3.43) 

Multiply both sides on the left by v! and simplify to obtain 
(da)v'w = v' (dA)w (3.44) 
Next, apply the vec operator to both sides of (3.44). Since the left side is a scalar, the 


vec operator has no effect. The right side is a product of three quantities, so Roth’s 
theorem implies that 


(dav w= (w @ v") dvec A. (3.45) 


The First Identification Theorem then gives 


dh T T 
zwo (3.46) 
dvecTA viw 
Finally, the chain rule (2.18) gives us 
dd dà dvecA 
(3.47) 


do" = dvec'A qo! 


The matrix calculus approach is particularly powerful because of the flexibility in 
specifying the effect of @ on the vital rates. Suppose that A depends on a vector ø 
of survival probabilities, which are a function of the concentration X of a pollutant, 
which in turn is changing as a function of time t. The rate of change of A over 


time is 
dix di dvecA d dx 
= = ) ( z) ( (3.48) 
dt dvec TA do! dX dt 


Each of the terms in (3.48) can be evaluated separately; the matrix product gives the 
correct dimension for the final sensitivity result (a 1 x 1 scalar in this case). 


3.5 Second Derivatives of Population Growth Rate 


The second derivatives of à measure the curvature of the response to changes in 
parameters. They have important applications in evolutionary demography, where 
they indicate the action of stabilizing, disruptive, or correlational selection on 
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fitness-related traits (e.g., Phillips and Arnold 1989; Caswell 2001), in adaptive 
dynamics, where they help determine the stability of evolutionary singular strategies 
(e.g., Diekmann 2004), and in extending sensitivity analysis to second-order effects. 

Since the first derivatives of à are written, in Eqs. (3.29) and (3.46), in terms 
of the right and left eigenvectors of A, the second derivatives of à require the first 
derivatives of those eigenvectors. Caswell (1996) derived the second derivatives of À 
to entries of A by an extension of the method in Sect. 3.3. However, a more general 
and rigorous method is available using matrix calculus. 

Consider a (scalar) variable £ which is a function of a vector 0 of parameters. 
The complete set of second derivatives of £ are given by the Hessian matrix 


2 
H= ( ue ) (3.49) 
06;00; 


Magnus and Neudecker (1988) proved (their Second Identification Theorem) that if 
the second differential of € can be written as 


dE = d6'Bdo (3.50) 


for some matrix B, then 
1 
H= 5 (B+B"). (3.51) 


Shyu and Caswell (2014) used this approach to derive the second derivatives of the 
population growth rate A, the continuous-time population growth rate r = loga, 
and the net reproductive rate Ro, to changes in either the entries of A or to arbitrary 
lower-level parameters of which A is a function. We will not explore second 
derivatives in this book, but Shyu’s other work (Shyu and Caswell 2016a,b) applies 
them to analyze the evolutionary demography of sex ratios, and Caswell and Shyu 
(2017) use them to analyze the effects of mortality on the selection gradients on 
senescence. 


3.6 Conclusion 


Each of the three approaches to growth rate sensitivity, leading to Eqs. (3.7), 
(3.8), (3.29), and (3.42), uses its own analytical methods. They agree, however, in 
showing how the sensitivity of population growth rate can be written in terms of 
the stable stage distribution and the reproductive value. In general, the effect of a 
change in the rate at which individuals move from stage j to stage i is proportional 
to the abundance of the origin stage (j) and the reproductive value of the destination 
stage (i). If a transition yields individuals with low reproductive value, or if few 
individuals are available to experience the change in the rate of transition, the effect 
on population growth will be small. 
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Chapter 4 A 
Sensitivity Analysis of Longevity and Life ss 
Disparity 


4.1 Introduction 


The population growth rate (A or r) analyzed in Chap.3 is a population-level 
consequence of the individual-level vital rates. A similarly basic outcome, at 
the individual or cohort level, is longevity: the length of individual life. The 
most commonly encountered description of longevity is its expectation, the life 
expectancy. However, longevity is a random variable, differing among individuals 
(even when those individuals are subject to the same rates and hazards) because 
of the random vagaries of mortality and survival. Therefore, it is important to also 
consider its variance and higher moments. This chapter introduces the sensitivity 
analysis of longevity, which will be explored in more detail in Chaps. 5, 11, and 12. 

As in Chap. 3, we will begin by reviewing a classic formula for the sensitivity of 
life expectancy in age-classified models. The we will use matrix calculus to derive 
more general formulas for the moments of longevity, the distribution of age or stage 
at death, and the life disparity, applicable to age- or stage-classified populations. 


4.2 Life Expectancy in Age-Classified Populations 


Notation It is customary to denote life expectancy by symbols like e? or e(x), but 
in general the symbol e plays too many roles in mathematics to be helpful for our 
purposes. So, when we make the transition to matrix formulations, I will use the 
symbol 7, in various vector and scalar manifestations, to indicate longevity. 

Perturbation analysis of longevity has been pursued mostly within the framework 
of age-classified life cycles (e.g., Canudas Romo 2003; Keyfitz 1971; Pollard 1982; 
Vaupel 1986; Vaupel and Canudas Romo 2003). The life expectancy at age x is 
given by 
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1 [0,0] 
e(x) = a! L(s)ds (4.1) 


where the survivorship function £(x) is the probability of survival to age x. 
The classical result for the sensitivity of life expectancy at birth to a change in 
mortality at age a is 


de) _ 
Iia 7 Oo. (4.2) 


That is, the sensitivity of life expectancy at birth to a change in mortality at age a is 
equal to the product of the probability of survival to age a and the life expectancy 
at age a. In other words, e(0) is most sensitive to changes in mortality at ages 
to which lots of individuals survive (to experience the change in mortality) and 
beyond which there is lots of longevity remaining (so they can enjoy the change 
in mortality). The derivative is negative because increasing mortality reduces life 
expectancy. 

The result was presented independently by Keyfitz (1971) who also referenced 
some earlier approaches (Wilson 1938; Irwin 1949) and by Pollard (1982). Keyfitz’s 
derivation was sketchy, and Pollard simply stated that the result was well-known, 
and gave no derivation. From a general sensitivity analysis perspective, we can 
derive the result using the same approach applied in Chap. 3 to population growth 
rate. 


4.2.1 Derivation 


Differentiating (4.1) with respect to mortality at some specified age a gives 


de) _ r dee) 4. ga 
0 


dua) dula) 


and our problem reduces to finding the derivative of £(s) with respect to u(a). To 
do so, introduce a parameter 0 to measure the size of the perturbation at age a, and 
write mortality as 


L(x, 9) = w(x, 0) + 6 d(x — a) (4.4) 


where 5(x — a) is the Dirac delta function.! The derivative with respect to u(a) is 
obtained by differentiating with respect to 6 and evaluating the result at 0 = 0. 


'See Chap. 3 for a description of this generalized function. 
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Write survivorship as 


€(x, 0) = exp l- MZ, edz] (4.5) 
0 
so that 
dl(x,0) * du(z, 8) 
EP L ex, 0) [ HED a (4.6) 
From (4.4) we have 
duz, 0) 
7 d(z — a) (4.7) 
so that 
KOP L e,o) [ 8(z — a)dz (4.8) 
= —0(x,0)H(x — a) (4.9) 


where H(-) is the unit step function. Substituting this into (4.3) and evaluating at 
0 = 0 gives 


de) _ 
du(a) 


= -f &(s)ds (4.11) 


-f L(s)H(s — a)ds (4.10) 
0 


which, by (4.1) is equal to (4.2). 


4.3 A Markov Chain Model for the Life Cycle 


Age has a special status in demography because it is continuous, linear, and permits 
movement in only one direction and at one rate (age increases by one unit for every 
unit of time). All other demographic characteristics have the potential for much 
greater flexibility, and the operators that describe movement and development of 
individuals require an equal degree of flexibility. This book is devoted to matrix 
formulations of these problems, which have the great advantage of permitting 
both age and stage-classified models. The basic formulation, as far as longevity 
is concerned, is that of a finite-state absorbing Markov chain. 
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4.3.1 A Markov Chain Formulation of the Life Cycle 


We describe the life cycle as an absorbing Markov chain. This approach was pio- 
neered in demography by Feichtinger (1971) and Hoem (1969), and has been greatly 
extended in recent years (Caswell 2001, 2006, 2009; Horvitz and Tuljapurkar 2008; 
Tuljapurkar and Horvitz 2006; Steinsaltz and Evans 2004). Good sources for the 
basic theory of absorbing Markov chains are Kemeny and Snell (1976) and Iosifescu 
(1980). 

These models will be explored in more detail in Chaps. 5 and 11. The sensitivity 
analysis of measures of variance in longevity has been developed by Van Raalte 
and Caswell (2013) and Engelman et al. (2014). An important extension of Markov 
chain models for longevity is the incorporation of “rewards” to represent the value, 
in some sense, of the length of life, extending methods developed for dynamic 
programming (Howard 1960). The rewards include the production of offspring 
(Caswell 2011; van Daalen and Caswell 2015, 2017), the accumulation of income 
and expenditures (Caswell and Kluge 2015) and healthy longevity (Caswell and 
Zarulli 2018). The sensitivity analysis of these important models is derived in van 
Daalen and Caswell (2017). 

Markov chain theory distinguishes between recurrent and transient states. A 
recurrent state has the property that the probability of returning to that state at least 
once is 1. A transient state is one for which that probability is less than 1. If a Markov 
chain contains transient states, it will eventually leave those states and arrive in a 
recurrent state or class of states, where it will remain permanently. Such a chain 
is called absorbing. Absorbing chains are the basic model for the demography of 
individuals because life is inherently transient. Any individual will, with probability 
one, eventually leave the set of living states and be absorbed by death. 

If a Markov chain consists of a single set of recurrent states that all communicate 
with each other, it is said to be ergodic. The transition matrix for an ergodic chain is 
irreducible and primitive. Ergodic Markov chains play a limited role in demographic 
contexts because they cannot include mortality. Chapter 11 will, however, present 
the sensitivity analysis of these models. 

In demographic models, individuals move among a set of transient (i.e., living) 
states in their life cycle before they eventually reach an absorbing state (death). 
Transient states may represent age classes, developmental or life history stages, or 
states defined by health, employment, economic, or other kinds of status. In studying 
longevity, we are particularly interested in absorbing states representing death, or 
perhaps death classified by age or stage at death, or by cause of death. The analysis 
applies equally to other ways of leaving the life cycle (e.g., graduation in a model 
of educational states, discharge from treatment in model of health states). 

Number the stages in the life cycle so that the transient states are 1, ...,s and 
the absorbing states are s + 1,..., s + a. Then the transition matrix of the Markov 
chain is 
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UJO 
P= G r) (4.12) 


Here, U is the s x s matrix of transition probabilities among the transient states. 
The a x s matrix M gives the probabilities of absorption in each of the absorbing 
states. The columns of P sum to one. I assume that the spectral radius (the dominant 
eigenvalue) of U is strictly less than one; a sufficient condition for this is that there 
is a non-zero probability of ultimate death for every stage. 

Age-classified models are a special case with survival probabilities on the 
subdiagonal (and possibly in the last diagonal entry); e.g., for s = 3 in which 


U=|p00 (4.13) 


The age-specific survival probability is p; = e7”! , with u; a mortality rate applying 
to age class i. The (s, s) entry of U is an age-independent survival probability for a 
final open-ended age class, with a remaining life expectancy of 1/(1— ps). If ps = 0 
no one survives beyond age class s. When the age-classified model is constructed 
from a life table, p; = 1 —qi—1; that is, the survival of age-class | is the complement 
of the probability of death between age 0 and 1. 

The mortality matrix M gives the probabilities of transition from each of the 
transient states to each of the absorbing states. Figure 4.1 shows some examples of 
life cycle formulations that can arise, including both age and stage classification in 
the transient states, and absorbing states classified by age at death, grouped ages at 
death, stage at death, or cause of death. The resulting mortality matrices are 
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Fig. 4.1 Life cycle graphs showing some alternative choices for structure of the absorbing state: 
death, age at death, stage at death, or cause of death. (a) Age-classified with one dead state. (b) 
Age-classified, age at death. (c) Age-classified, grouped ages at death. (d) Stage-classified, stage 
at death. (e) Age-classified, causes of death 


Figure 4.le M= & #2 B A (4.18) 
S1 S2 83 S4 


The beauty of formulating longevity as a Markov chain is that many statistics of 


longevity can be written in terms of the matrices U and M and sensitivity analysis 
can be carried out using matrix calculus. 


4.3.2 Occupancy Times 


Consider an individual in transient state j. Eventual absortion is certain. But before 
that, the individual will occupy various transient states. The number of such visits, 
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the occupancy time? is the basic unit of longevity. Occupancy is particularly central 
in studies of health demography, where it quantifies the parts of a life spent in 
different health states. But, even without the added dimension of something like 
health, occupancy of transient states is the basis of longevity analysis. 

Let v;; be the number of visits to transient state i by an individual in transient 
state j, prior to absorption. Its expectation is given by the fundamental matrix (e.g., 
Kemeny and Snell 1976; Iosifescu 1980) 


N = (E(j)) (4.19) 
=(I-U)! (4.20) 


More details, and examples, for the higher moments and variances of occupancy 
times are given in Chaps. 5 and 11. 


4.3.3 Longevity 


The longevity of an individual in state j can be equated to the total occupancy time 
of all transient states by that individual, prior to eventual absorption. Let n; be this 
longevity; the expectation of 7; is the sum of the elements in column j of N. We 
define 9, and 77, as the vectors containing the first and second moments of longevity, 
respectively. Then 


Em! =n] =1'N (4.21) 


Figure 4.2a shows the life expectancy for India in 1961 and Japan in 2006. 
The vector of the second moments of longevity satisfies 


n =n N-D (4.22) 


(losifescu 1980). The variance and standard deviation of longevity are thus 


VM) =m- om (4.23) 
SDM) = V) (4.24) 


where the square root is taken element-wise. 


2Because time is discrete here, the number of visits is equal to the number of time increments, 
which is the amount of time spent in the state. In continuous-time models, the number of visits to, 
and the length of time spent in, a transient state are different. The corresponding calculations for 
continuous-time models are given in Chap. 12. 
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Fig. 4.2 Calculations for longevity of India (1961) and Japan (2006). (a) Remaining life 
expectancy as a function of age. (b) Standard deviation of remaining longevity as a function of 
age. Vertical line at age 10 indicates S D10, sometimes used as a measure of lifespan disparity. (c) 
Sensitivity of life expectancy at birth to changes mortality at each age. (d) Sensitivity of variance in 
longevity at birth to changes in mortality at each age. (e) Sensitivity of life disparity n* to changes 
in mortality at each age 


4.3 A Markov Chain Model for the Life Cycle 53 


Note that V(7) and SD(m) are vectors; their elements give the variance or 
standard deviation of longevity for individuals in each stage, making it easy to 
examine variation in remaining longevity conditional on the starting age. This 
conditioning can be important; Edwards and Tuljapurkar (2005) have made a strong 
case that SD(710), starting from age 10, is a good index to prevent infant and child 
mortality from obscuring patterns in old age longevity. 

Figure 4.2b shows SD(y) for India and Japan. The standard deviation at birth, 
SD(1) is roughly twice as great in India as in Japan, a discrepancy that remains at 
SD(n10). Eventually, beyond the age of 50, SD(7) becomes greater in India than in 
Japan. 


4.3.4 Age or Stage at Death 


If the model contains more than one absorbing state (as in all the cases but the first in 
Fig. 4.1), the eventual fate of an individual is uncertain. The probability distributions 
of the eventual absorbing state are given by the columns of the matrix 


B= MN (4.25) 


where b;; is the probability of eventual absorption in absorbing state i for an 
individual starting in transient state j (losifescu 1980). 

Suppose that the absorbing stages are defined as the age (or stage) at death, as in 
Fig. 4.1b, d. Then M is given by Eq. (4.17) and the jth column of B is the probability 
distribution of age at death for an individual starting in age class j: 


Y; = BC, j) = Bej. (4.26) 


4.3.5 Life Lost and Life Disparity 


When an individual dies, it loses the remaining life that it would have experienced, 
had it not died. This counterfactual proposition seems abstract, but we can make it 
concrete by asking for the expectation of that lost lifetime. An individual that dies at 
age x will lose, on average, an amount of life given by the life expectancy at age x. 
Averaging this remaining life expectancy over the distribution of age at death gives 
the mean life lost due to mortality. Vaupel and Canudas Romo (2003) denoted the 
life lost by et. Here we define the vector 9‘, whose ith entry is the expected life lost 
due to mortality by an individual starting in age class i; it is given by 


i) = |B. (4.27) 
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Calculations of life lost from mortality due to specific causes of death play a 
central role in the calculations of disability-adjusted life years (DALYs) used in 
calculations of the burden of diseases (e.g., Devleesschauwer et al. 2014; GBD 
2016 DALYs and HALE Collaborators 2017). See Caswell and Zarulli (2018) for 
the relationship between DALY calculations and Markov chain methods, and for a 
calculation of the variance in life lost. 

The life lost 9‘ has an additional interpretation as a measure of disparity. 
Consider a population in which everyone dies at the same age. In such a situation, 
n? = 0, because at the age of death, there is no additional life expectancy. Thus nt is 
a measure of “life disparity;” the larger its value, the more disparity there is among 
individuals in age at death (Vaupel et al. 2011). 

The values of life disparity in age class 1, for Japan and India, in years, are 

+ _ | 10.1 Japan 
a | 23.9 India peen 


Just as India has a much larger variance in longevity than Japan, it also has a higher 
life disparity. 


4.4 Sensitivity Analysis 


Our goal is to obtain expressions for the derivatives of E (n), V (n), SD(n), B, and 
n', with respect to changes in age specific-mortality rates. The calculations and 
some results (contrasting the mortality schedules of Japan and India) are given here. 
More details are presented in Chaps. 5 and 11. Results are presented in terms of an 
arbitrary vector 0 of parameters on which U and M depend. In the examples, 6 will 
be the vector yt of age-specific mortality rates. 


4.4.1 Sensitivity of the Fundamental Matrix 


The fundamental matrix N appears in many of these formulas. Its sensitivity was 
first obtained by Caswell (2006). Suppose that U is a function of some vector 0 of 
parameters. Then 


dvecN _ (Nt @ N) dvec U 


= 4.29 
do! do" ee 


(see Chap. 5). 
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4.4.2 Sensitivity of Life Expectancy 


The sensitivity of the vector of life expectancy as a function of age is obtained by 
differentiating (4.21), 


dn} = 1" (dN) (4.30) 


Applying the vec operator and Roth’s theorem (2.13) gives 


dm, = (18 1") avec N (4.31) 
= (1 Q 1") (NT Q N) dvec U (4.32) 
= (NT & nl) dvec U. (4.33) 


The last step uses the fact that (A ® B)(C & D) = (AC @ BD). Applying the chain 
rule and the first identification theorem gives the result 


dy, L (Nt 1) dvec U (4.34) 


do" "ij eT 


Sensitivity to mortality If interest focuses on changes in age-specific mortality, so 
that 0 = p, then the sensitivity formula expands, using the chain rule, to 


dy; ANT T\ dvecU 
oar = (Neat) Ts (4.35) 


This can be evaluated in several ways, depending on how the matrix U is written as 
a function of mortality. One approach is used in Sect. 4.4.3, and a somewhat more 
widely useful approach in Sect. 4.4.4. 

The results for Japan and India are shown in Fig. 4.2. Life expectancy is more 
sensitive to changes in mortality in Japan than in India; the (absolute value of) 
sensitivity decreases almost linearly with age in Japan, and slightly less linearly 
in India (Fig. 4.2). On the other hand, life expectancy is more elastic to changes in 
mortality in India, and less so in Japan. 


4.4.3 Generalizing the Keyfitz-Pollard Formula 


The Keyfitz-Pollard formula for the sensitivity of life expectancy to changes in 
mortality rate, given in Eq.(4.2), has a clear interpretation: the sensitivity to 
mortality at age a depends on the probability of survival to age a and the remaining 
life expectancy at age a. We are now in a position to generalize this to stage- 
classified matrix models. 
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First, we derive the matrix version of the Keyfitz-Pollard result, for the sensitivity 
of life expectancy of age class 1, which is 


dE(m) = (el 2 1") dvecN (4.36) 


= (el Q 1") (NT Q N) dvec U (4.37) 


Consider a population with s age classes and let jz; be the mortality rate and p; = 
exp(—ui) the survival probability for age class i. The matrix U is given by (4.13), 
which can be written 


s—1 


u=)> (erriel) Pe (4.38) 


k=1 


where ex is the unit vector, of length s, with a 1 in the kth position and zeros 
elsewhere. Differentiating U and applying the vec operator gives 


s—1 


dvecU = — È (ek Q er+1) pe (dur) (4.39) 
k=1 


Substitute (4.39) into (4.37) and consider a perturbation of mortality at age a; the 
result is 


dE(1) 
T= — (ef @17) (NT 8N) (€a @ea41) pa: (4.40) 
dita 
This simplifies to 
dE 
(mi) (eT NTea 2 1™Nea+1) Pa (4.41) 
dha 
= — E (va) Pa E (na+1) age-classified (4.42) 
Ee ~ 
survival expectancy 


In an age-classified model, vg is either 0 or 1 (you cannot occupy a year of age for 
more than 1 year); hence the E (va) is the probability of survival to age a. Thus we 
have a matrix version of the Keyfitz-Pollard result: the sensitivity of life expectancy 
is the probability of survival to age a times the probability of survival from a to 
a + 1, times the life expectancy at age a + 1. 

Now apply the same approach to a stage-classified model, in which U can be 
written as the product of a diagonal matrix © with survival probabilities on the 
diagonal, and a stochastic matrix G giving the transition probabilities conditional 
on survival: 
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U=G> (4.43) 
SG) 2%, % (4.44) 
0 ae Ds 
=c)- (eret) Pk (4.45) 
k=1 


Differentiating and applying the vec operator gives 


dvecU = Ý (ex ® Gex) px (dux) (4.46) 
k=1 


Substitute this into (4.37) and focus on a change in mortality at stage a; the result is 


dE(n) = T T T 
E (e1 81") (N 8N) (€a 8 Gea) pa (4.47) 
which simplifies to 
dE 
ee (e]NTeq 8 INGea) pa (4.48) 
dha 
= —E (vai) E (1°) GG, a)pa (4.49) 
S 
= — E (vq1) > Pa8ha E (nn) stage-classified (4.50) 
— =- ~ 


occupancy A=! transitions expectancy 


Equation (4.50) is the stage-classified version of Keyfitz-Pollard: the sensitivity 
of life expectancy to a change in mortality in stage j is the product of the expected 
time spent in stage j and the remaining life expectancy, calculated as an average of 
the life expectancy of all stages k, weighted by the probability of transition from j 
to k. This can be simplified further by noting that, for either age or stage-classified 
populations, G(:, a) pq = U(:, a), so that a completely general expression is 


dE 
a =-E(vq1)E (n") U(:, a) age- or stage-classified (4.51) 
Ha 


4.4.4 Sensitivity of the Variance of Longevity 


The sensitivity of the variance in longevity is obtained by differentiating (4.23) 
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dV (n) = dm — 2 (n; o dn) (4.52) 


and applying the vec operator (using results from Chap.2 on the vec of the 
Hadamard product), to obtain 


dV (n) = dm — 2D (dy. (4.53) 
The derivative of 9, is already given by (4.33): 
dn = T T 
n= (N 2 n) dvec U. (4.54) 
The derivative of ņ, is obtained by differentiating (4.22): 
dyl =2 (ani) N + dq! (dN) — dy! (4.55) 
Applying the vec operator to both sides and substituting (4.29) for dvec N gives 
dq = (2N! — 1) d Ton] 
2 = ni +2(N @7,N)dvecU (4.56) 
Inserting (4.54) for dy, and (4.56) for dy, into (4.53) gives the sensitivity of 
the variance in remaining longevity, for any starting age or stage, to changes in 


U. The sensitivity of longevity to mortality is obtained by differentiating U with 
respect to p. 


Derivatives of U The derivative of U to the mortality vector m are obtained as 
follows. For an age-classified model, define an age-advancement matrix 


00 0 
L={100 (4.57) 
01[1] 


(show here for three age classes, with the optional open-ended last age class). 
This matrix will mask the entries of a matrix 1p', that contains p in each row, 
to obtain 
U=Lo (1p") (4.58) 
Differentiating and applying the vec operator gives 
Fi ee (1 (ap")) (4.59) 
dvecU = D (vec L) (1@ 1) dp. (4.60) 
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Since p = exp(—p), 
dp = —D (p)dp, (4.61) 
and hence 
dvecU = —D (vec L) 1@1)D (p)du age-classified (4.62) 
For a stage-classified model, write U = GY, as in (4.44) as 
U=G [1 6 (1p") | (4.63) 
Differentiating and applying the vec operator, following the strategy of (4.60), gives 
dvecU = — (I & G) D (vecD A 8 1) D (p)du stage-classified (4.64) 


Substituting (4.62) and (4.64) into the expressions for dy, and dq, and 
substituting those into (4.53) gives the sensitivity of the variance in longevity to 
age- or stage-specific mortality. It is possible to carry out the substitutions and to 
arrive at a single (large) expression for d V (7); see Chap. 5. 

Figure 4.2d shows the sensitivity and elasticity of variance of longevity to 
changes in age-specific mortality. The variance is more sensitive to mortality 
changes in Japan than in India, and the sensitivities are highest at young ages. Both 
life tables have the property that sensitivities are positive at early ages (~0-20 for 
India, ~0-80 for Japan) and then become negative. Before this age, reductions in 
mortality will reduce variance; after this age, reductions in mortality increase the 
variance. See Sect. 4.4.6 for more on this. 


4.4.5 Sensitivity of the Distribution of Age at Death 


The sensitivity of the distribution of age or stage at death is obtained by differenti- 
ating (4.25) and applying the vec operator, 


dvecB = (NT 2 1) dvec M + (I@ M) dvec N. (4.65) 


We already know dvec N. To obtain dvec M, note that when the absorbing states are 
defined in terms of stage at death 


M=I-D(p) (4.66) 
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and thus 
dvecM = —D(vec]) (I & 1) dp (4.67) 


It is revealing to write the sensitivity of B to changes in mortality using the chain 
rule, 


dvec B Z (Nt 1) dvecM dp dvecN dvecU dp (4.68) 


+(1@M 
dp! ( NIT dp! dp! 


and to recognize how many of the pieces we have already obtained. 
The distribution of stage at death for individuals starting in stage j is given by 
column j of B; i.e., Y; = BC, j). The sensitivity of y; to changes in mortality is 


dy; dvec B 
ae (e; 81) aut 


(4.69) 


for any age or stage j of interest. 


4.4.6 Sensitivity of Life Disparity 


To get the sensitivity of the vector ņ*, differentiate and apply the vec operator to 
Eq. (4.27), which gives 


dn = B'dn; + (1 2 nt) dvec B. (4.70) 


Evaluating this expression for the data on India and Japan, we see that the 
sensitivity of n? shows a pattern similar to that of the sensitivity of V (7) (Fig. 4.2), 
confirming that these indices are measuring similar aspects of disparity in longevity. 

In particular, they show the existence of a critical age, before which reductions 
in mortality reduce disparity and after which they have the opposite effect. Zhang 
and Vaupel (2009) showed that this critical age, which they describe as separating 
“early” from “late” deaths is a general property of n”. Although the details depend 
on which index of disparity one uses, the existence of a critical age separating 
positive and negative sensitivities is also a property of other measures of variation in 
longevity (Van Raalte and Caswell 2013). Vaupel et al. (2011) have used the critical 
age to decompose historical changes in lifespan disparity into components due to 
early and late mortality. 
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4.5 A Time-Series LTRE Decomposition: Life Disparity 


The LTRE decomposition analysis in Sect. 2.9 can be used to decompose time series 
such as these into their components. We apply it here to calculate the contributions, 
to a long trajectory of changes in nt, of changes in early and late mortality. 

Suppose that some demographic outcome &(t) (dimension s x 1) is measured 
as a function of a parameter vector 0 (dimension p x 1), at times 1, 2, ... T. The 
changes in &(t) over time result from the changes in the parameters, 


Ag(t) = (t+ 1) E(t) (4.71) 
AO(t) = O(t +1) — A(t) (4.72) 


The decomposition analysis for such sequences was introduced as a “regression 
LTRE” method in the context of ecotoxicology and response to environmental 
factors (e.g., Caswell 1996; Knight et al. 2009). The same approach was introduced 
independently by Horiuchi et al. (2008) to decompose differences between two 
conditions by imagining a continuous path from one to the other. 

The analysis starts by considering the change in & over time, 


d&(t) _ d&(t) d0(t) 


= 4.73 
dt do't) dt ee) 
If the time series is evaluated at discrete times t = 1,..., T, then to first order 
dé (t) 
A&(t) © A0 (t) sxl (4.74) 
i do" (t) 
The contributions to A&(t) are displayed separately in a contribution matrix 
d&(t 
C(t) = = ) D [A@(t)] SX p (4.75) 
d0 (t) 


the (i, j) entry of C(t) is the contribution of A8; (t) to Ag;(t). The contributions 
additive over time, so the contributions of all the changes, integrated from t; to t2, 
are given by the entries of 


t2 
C(t.) =) CA (4.76) 


tt 


Suppose the dependent variable is € = 7‘ and the parameter vector is 0 = p. 

At each time and for each age, we aggregate the contributions from early and late 
mortality. Let X be an indicator matrix whose entries define whether a particular 
entry of C(t) is to be counted as early or late: 
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Pe 1 6; contributes to Aé; ae 
‘! | 0. otherwise : 
Then 
e(t) = (C(t) o X)1 (4.78) 


is a vector giving the contributions to the change in € from the parameters chosen in 
X. Defining Xearly and Xjate gives changes at time ft due to early and late mortality. 
The LTRE analysis is then 


t2 
Cearly (t1, 12) = È Ceary (t) (4.79) 


ti 


and similarly for Cate (t1, t2). 
As an example, Fig. 4.3a, b shows a time series of life expectancy (increasing 
from about 40-80 years between 1800 and 2010) and life disparity for Swedish 
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Fig. 4.3 (a) Historical trends in life expectancy at birth from 1800 to 2010. (b) Historical trends 
in life disparity (mean years of life lost due to mortality) for ages 0 and 50 years. (c) Contributions 
from early and late mortality improvement to the change in disparity at age 0. (d) The contributions 
for disparity at age 50. (Data for Swedish females, from the Human Mortality Database) 
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females, based on data from Human Mortality Database (2016). As in most 
developed countries, life disparity at birth dropped dramatically from 1850 to about 
1950 (e.g., Edwards 2011; Vaupel et al. 2011). Declines at later ages were less 
dramatic, and remaining life disparity conditional on survival to age 50 has been 
almost flat (Engelman et al. 2014). How did changes in early and late mortality 
contribute to these patterns? 

Figure 4.3c, d show the cumulative sums of the contributions Cearly and Cate, and 
their total, for ages 0 and 50. The decline in life disparity at birth was driven almost 
completely by improvements in early mortality, which completely overshadowed a 
small increase in disparity that was generated by improvements in late life mortality. 
The picture for remaining life disparity at age 50 is different: the contributions 
from changes in early and late life mortality almost completely cancel each other 
out. These patterns, looking at the details of a single time series, agree with the 
much more general exploration of multiple countries, using a different approach, by 
Vaupel et al. (2011). 

The accuracy of the decomposition can be evaluated by comparing the time 
series calculated from the total contributions, as shown in Fig.4.3c, d, with the 
observed series, as shown in Fig. 4.3b. The agreement is extremely close; the LTRE 
decomposition captures the end result of the historical changes from 1800 to 2010 
with an error of less than 0.1%. 


4.6 Conclusion 


This chapter and Chap. 3 contain examples of different approaches to the sensitivity 
analysis, of population growth rate and longevity, respectively. The power and 
flexibility of matrix calculus methods is apparent: the models are not restricted to 
age- or stage-classification, the absorbing states may be a single category of death or 
some more diverse set, the demographic outcomes are not limited to expectations, 
and the independent variables, the parameters that are being perturbed, can be 
anything of interest. The only requirement is that a chain of functional dependence 
can be followed: the outcome £ depends on U, which depends on p, which depends 
on 4, ...and so on. Mortality might depend on health status, which might depend 
on income level, which might depend on education, ..., and so on. The sensitivity 
of £ to any of these parameters is a application of the chain rule. 
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Chapter 5 A 
Individual Stochasticity and Implicit Age ss 
Dependence 


5.1 Introduction 


Demography is the study of the population consequences of the fates of individuals. 
As an individual organism develops through its life cycle it may increase in size, 
change its morphology, develop new physiological functions, exhibit new behaviors, 
or move to new locations. It may marry and divorce, become ill and recover, or 
change its employment status. It may change sex and/or change its reproductive 
status. These changes can be dramatic. This developmental process, and its attendant 
risks of death and opportunities for reproduction, determine the rates of birth and 
death that, in turn, determine population growth or decline. 

Individuals are differentiated on the basis of age or, in general, life cycle stages. 
The movement of an individual through its life cycle is a random process, and 
although the eventual destination (death) is certain, the pathways taken to that 
destination are stochastic and will differ even between identical individuals; this 
is individual stochasticity. A stage-classified demographic model contains implicit 
age-specific information, which can be analyzed using Markov chain methods. The 
living stages in the life cycles are transient states in an absorbing Markov chain, in 
which death is an absorbing state. 

This chapter presents Markov chain methods for computing the mean and 
variance of the lifetime number of visits to any transient state, the mean and variance 
of longevity, the net reproductive rate Ro, and the cohort generation time. It presents 
the matrix calculus methods needed to calculate the sensitivity and elasticity of all 
these indices to any life history parameters. 


Chapter 5 is modified from Caswell, H. 2009. Stage, age, and individual stochasticity in 
demography. The Per Brinck Oikos Award Lecture 2008. Oikos 118:1763-1782. ©Hal Caswell 
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The Markov chain approach is then generalized to variable environments (deter- 
ministic environmental sequences, periodic environments, iid random environ- 
ments, Markovian environments). Variable environments are analyzed using the 
vec-permutation method to create a model that classifies individuals jointly by the 
stage and environmental condition. Throughout, examples are presented using the 
North Atlantic right whale (Eubaleana glacialis) and an endangered prairie plant 
(Lomatium bradshawii) in a stochastic fire environment. 


5.1.1 Age and Stage, Implicit and Explicit 


The essence of demography is the connection between the fates of individual 
organisms and the dynamics of populations. There exist diverse mathematical 
frameworks in which this connection can be studied (Keyfitz 1967; Metz and Diek- 
mann 1986; Nisbet and Gurney 1982; Caswell 1989; Tuljapurkar and Caswell 1997; 
Caswell et al. 1997; DeAngelis and Gross 1992; Ellner et al. 2016). Regardless 
of the type of equations used, demographic analysis must account for differences 
among individuals, and the ways in which those differences affect the vital 
rates. 

Among the many ways that individuals may differ, age has long had a kind 
of conceptual priority. Age is universal in the sense that every organism becomes 
one minute older with the passage of one minute of time. Age is also often 
associated with predictable changes in the vital rates. However, in some organ- 
isms characteristics other than age provide more and better information about 
an individual. Ecologists recognized this long ago, and have developed demo- 
graphic theory based on size, maturity, physiological condition, instar, spatial 
location, etc.—referred to in general as “stage-classified” demography. Human 
demographers, who were responsible for the classical age-classified theory, by 
no means deny the importance of other properties, such as employment, par- 
ity, or health status; see Land and Rogers (1982), Goldman (1994), Robine 
et al. (2003), and Willekens (2014) for a sample of the kinds of issues that 
arise. 

Even when the demographic model is entirely stage-classified, however, age is 
still implicitly present. Individuals in a given stage may differ in age, and individuals 
of a given age may be found in many different stages, but each individual still 
becomes one unit of age older with the passage of each unit of time. Extracting 
this implicit age-dependent information makes it possible to calculate interesting 
age-specific properties, such as survivorship, longevity, life expectancy, generation 
time, and net reproductive rate (Cochran and Ellner 1992; Caswell 2001, 2006; 
Tuljapurkar and Horvitz 2006; Horvitz and Tuljapurkar 2008). ! 


' Explicit age and stage dependence is explored in Chap. 6; see also Caswell and Salguero-Gémez 
(2013) and Caswell et al. (2018). 
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In this chapter, I show how to calculate some of these implicit age-specific 
properties from any stage-classified model. The trick is to formulate the life cycle 
as a Markov chain, and to generalize the “life” cycle to include death as a stage. 
Because death is permanent, it is called an absorbing state, and the theory of 
absorbing Markov chains provides the starting point for our analysis (Feichtinger 
1971; Caswell 2001). 

A Markov chain is a stochastic model for the movement of a particle among a set 
of states (e.g., Kemeny and Snell 1976; Iosifescu 1980). The probability distribution 
of the next state of the particle may depend on the current state, but not on earlier 
states. In our context, a “particle” is an individual organism. The states correspond 
to the stages of the life cycle, plus death (or perhaps multiple types of death, for 
example deaths due to different causes). This structure is ideally suited to asking 
questions about individual stochasticity, because it accounts for all the possible 
pathways, and their probabilities, that an individual can follow through its life. I will 
focus on discrete-time models, but much of the theory can no doubt be generalized 
to continuous-time models. 

The use of Markov chains in demographic analysis is not new. As far as I 
know, Feichtinger (1971, 1973) was the first to use discrete-time absorbing Markov 
chains in demography, paying particular attention to competing risks and multiple 
causes of death. At around the same time, Hoem (1969) applied continuous-time 
Markov chains in the analysis of insurance systems (with states such as “active,” 
“disabled,” and “dead”). Later, Cochran and Ellner (1992) independently proposed 
the use of Markov chains to generate age-classified statistics from stage-classified 
models, but minimized the use of matrix notation in their presentation. Influenced by 
Feichtinger’s work, and relying heavily on Iosifescu’s (1980) treatment of absorbing 
Markov chains, I extended the calculations using matrix notation (Caswell 2001; 
Keyfitz and Caswell 2005), introduced sensitivity analysis (Caswell 2006), and 
presented results for both time-invariant and time-varying models. At the same 
time, Tuljapurkar and Horvitz (2006) and Horvitz and Tuljapurkar (2008) devel- 
oped the same approaches and presented a more extensive investigation of time 
variation. 


5.1.2 Individual Stochasticity and Heterogeneity 


Consider a newborn individual. As it develops through the stages of its life cycle, 
it may grow, shrink, mature, move, reproduce, and allocate resources among its 
biological processes. At each moment, it is exposed to various mortality risks. 
At each moment, it has some chance of reproducing. Because these processes are 
stochastic, the lives of any two individuals may differ. These random outcomes— 
this individual stochasticity—imply that the age-specific properties of an individual 
(say, longevity) are random variables—there is a distribution among individuals that 
should be characterized by its mean, moments, etc. (Caswell 2009). 
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It is critical to notice that the calculation of these moments explicitly assumes 
that every individual in a given stage experiences exactly the same rates and 
hazards. There is no heterogeneity among the individuals (or, at least, none that 
matters demographically), even though there is variation in their lifetime properties. 
Empirical studies of longevity or lifetime reproductive output find that the variation 
among individuals is usually large, but it is a mistake to jump to the conclusion that 
it is due to heterogeneity among individuals without first examining the variance 
that is inevitably created by individual stochasticity (e.g., Tuljapurkar et al. 2009; 
Steiner and Tuljapurkar 2012; Caswell 2011; Caswell and Kluge 2015; Caswell and 
Vindenes 2018; Hartemink et al. 2017; Hartemink and Caswell 2018; van Daalen 
and Caswell 2017). 


5.1.3 Examples 


The calculations will be demonstrated by means of two case studies. The first is 
a stage-classified model for the North Atlantic right whale (Eubaleana glacialis). 
Later, in Sect. 5.5.4, a stochastic matrix model for the threatened prairie plant 
Lomatium bradshawii will appear as part of a study of variable environments. 

The North Atlantic right whale is a large, highly endangered baleen whale (Kraus 
and Rolland 2007). Once abundant in the north Atlantic, it was decimated by 
whaling, beginning as much as a thousand years ago (Reeves et al. 2007). By 1900 
the eastern North Atlantic stock had been effectively eliminated, and the western 
North Atlantic stock hunted to near extinction. The population has recovered only 
slowly since receiving at least nominal protection in 1935, and now numbers only 
about 300 individuals. Right whales migrate along the Atlantic coast of North 
America, from summer feeding grounds in the Gulf of Maine and Bay of Fundy to 
winter calving grounds off the Southeastern U.S. They are killed by ship collisions 
and entanglement in fishing gear (Kraus et al. 2005), and may also be affected by 
pollution of coastal waters. 

Individual right whales are photographically identifiable by scars and callosity 
patterns. Since 1980, the New England Aquarium has surveyed the population, 
accumulating a database of over 10,000 sightings (Crone and Kraus 1990). Treating 
the first year of identification of an individual as marking, and each year of 
resighting as a recapture, permits the use of mark-recapture statistics to estimate 
demographic parameters of this endangered population (Caswell et al. 1999; 
Fujiwara and Caswell 2001, 2002; Caswell and Fujiwara 2004). 

Figure 5.1 shows a life cycle graph used by Caswell and Fujiwara (2004) as the 
basis of a stage-structured matrix population model for the right whale. The stages 
are calves, immature females, mature but non-reproductive females, mothers, and 
“resting” mothers (because of the long period of parental care and gestation, right 
whales do not reproduce in the year after giving birth). This life cycle is typical of 
large, long-lived monovular mammals and birds. 
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Fig. 5.1 Absorbing Markov chain transition graph for females of the North Atlantic right whale 
(Eubalaena glacialis). Projection interval is 1 year. Stages: 1=calf, 2=immature, 3 = mature, 
4=mother, 5 =post-breeding female, 6=death. See Caswell and Fujiwara (2004) for explanation 
and parameter estimates 


The model is parameterized in terms of survival probabilities o),...,05, the 
probability of maturation y2, and the birth probability y3. The projection matrix is 


0 0 F 0 0 
oi a(l — y2) 0 0 0 
A=| 0 op 03(1—y3) 0 os (5.1) 
0 0 ae. 00 
0 0 0 o4 0 


The fertility term in the (1, 3) position is F = 0.503y3,/o4, accounting for the sex 
ratio, the survival of mature females, their probability of giving birth if they survive, 
and the effect of survival of the mother on survival of the calf. For reasons related 
to parameter estimation, o5 is constrained to equal o3. 


5.2 Markov Chains 


The familiar life cycle graph (e.g., Fig.5.1) corresponds to a projection matrix 
A, in which a;; gives the per-capita production of stage i individuals at t + 1 
by a stage j individual at t. This production may occur by the transition of an 
individual from stage j to stage i, or by the production of one or more new 
individuals (by reproduction, fragmentation, etc.). So, we partition A into a matrix 
U describing transition probabilities of extant individuals and a matrix F describing 
the production of new individuals 


A=U+F (5.2) 


The column sums of U are all less than or equal to 1. Because individuals eventually 
die and pass out of the stages contained in U, those stages are called transient states. 
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5.2.1 An Absorbing Markov Chain 


If we include death explicitly (Fig. 5.1) and remove the arcs representing reproduc- 
tion, we obtain the graph corresponding to the transition matrix for an absorbing 


Markov chain 
P= (2 +) (5.3) 
m|1 


The element m; of the vector m is the probability of mortality of an individual in 
stage j. Death is an absorbing state. I will assume that at least one absorbing state is 
accessible from any transient state in U, and that the spectral radius of U is strictly 
less than 1. This guarantees that, with probability 1, every individual ends up in the 
absorbing state. 


The right whale Fujiwara estimated U by applying multi-stage mark-recapture 
methods to the photographic identification catalog. Although the best model, out 
of a large number evaluated, included significant time variation in survival and birth 
rates, here I will analyze a single matrix obtained from a time-invariant model. The 
complete transient matrix U and the fertility matrix F are 


0 0 0 0 O 
0.90 0.85 0 0 0 
U= 0 0.120.71 0O 1.00 (5.4) 
0 0 0.29 0 0 
0 0 0085 0 


000.1300 
00 0 00 
F=ļ|00 0 00 (5.5) 
00 0 00 
00 0 00 


5.2.2 Occupancy Times and the Fundamental Matrix 


As the syllogism asserts, all men are mortal; absorbtion is certain. Our question is, 
how long does absorbtion take and what happens en route? From a demographic 
perspective, this is asking about the lifespan of an individual and the events that 
happen during that lifetime. The key to these questions is the fundamental matrix of 
the absorbing Markov chain. Consider an individual presently in transient state j. 
As time passes, it will visit other transient states, repeating some, skipping others, 
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until it eventually dies. Let v;; denote the number of visits to, or the occupancy time 
in, transient state į that our individual, starting in transient state j, makes before 
being absorbed. The v;; are random variables, reflecting individual stochasticity. 

The entries of the matrix U give the probabilities of visiting each of the transient 
states after one time step. The entries of U? give the probabilities of visiting each of 
the transient states after two time steps. Adding the powers of U gives the expected 
number of visits to each transient state, over a lifetime, in a matrix N; i.e., 


=(1-v)!. (5.6) 


The right whale The fundamental matrix for the right whale is calculated from 
(5.6) to be 


1.00 0.00 0.00 0.00 0.00 
5.88 6.52 0.00 0.00 0.00 
N= | 16.35 18.11 22.94 19.49 22.94 |. (5.7) 
4.74 5.25 6.65 6.65 6.65 
4.02 4.46 5.65 5.65 6.65 


The first column corresponds to calves. On average, a calf will spend 1 year as a calf, 
5.9 years as a juvenile, 16.3 years as a mature but non-breeding female, etc. Row 
4 of N is of particular interest. Stage 4 represents mothers, so n4; is the expected 
number of reproductive events that a female in stage j will experience during her 
remaining lifetime. Based on this model, a newborn calf could expect to give birth 
n4, = 4.74 times. A mature female could expect to give birth n43 = 6.65 times; the 
difference reflects the likelihood of mortality between birth and maturity.” 

We would like to know how the entries of N vary in response to changes in the 
vital rates. To accomplish this, we need matrix calculus, which is the topic of the 
next section. 


2Note that n43 = n44 = nas = nss and ns3 = ns4. This seems to be due to the fact, specific to these 
data, that the survival probability of stages 3 and 5 is indistinguishable from 1.0, and influences the 
results below. 
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5.2.3 Sensitivity of the Fundamental Matrix 


Let us apply matrix calculus to find the sensitivity of the fundamental matrix N 
(Caswell 2006). This result will appear in the sensitivity analysis of most other 
demographic quantities. Let 6 be a vector of parameters (of dimension p x 1) on 
which the entries of the transition matrix U depend. The fundamental matrix satisfies 

I=NN_!. (5.8) 
Differentiating both sides gives 


0 = (dN)N-! +N (an`!) . (5.9) 


Applying the vec operator and Roth’s theorem to both sides gives 
iv! 1 
vec 0 = [o ) ® | dvecN + (I; ® N) dvec N` (5.10) 


Solving for dvec N gives 


T —1 
aveeN = | (N=) al, (L @ N) dvec U (5.11) 


To simplify this, it helps to know two facts about the Kronecker product: 


(A 8 B)! = A7! & B7! (5.12) 
(A 8 B) (C 8 D) = (AC & BD) (5.13) 


provided that the sizes of the matrices permit the indicated operations. Thus dvec N 
in (5.11) simplifies to 


dvecN = (NT 2 N) dvec U (5.14) 


The identification theorem (2.47) implies 


=N' QN 5.15 
dvec TU 2 ( ) 
and the chain rule permits us to write 
dvecN T dvec U 
aT = (N 2 N) T (5.16) 
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Fig. 5.2 Elasticity, to each of the vital rates, of reproductive outcomes in the right whale. (a) The 
elasticity of the expected lifetime number of reproductive events (E (v41)). (b) The elasticity of 
the variance in the lifetime number of reproductive events, V (v41). Vital rates: s}—s4 are survival 
probabilities (s5 = s3 by assumption in this model); g2 is the probability of maturation, and g3 is 
the probability of reproduction 


The left-hand side of (5.16) is a matrix, of dimension s? x p, containing the 


sensitivity of every entry of N to every parameter in 0. The matrix dvec U/ae" 
is an s? x p matrix containing the sensitivities of all the elements of U to all the 
elements of 0. From (2.55), the elasticity of the fundamental matrix is given by 


dvecN 
do" 


evecN 
e0! 


= D (vec N)7! D (0) (5.17) 


The right whale As an example, we use (5.16) and (5.17) to calculate the elasticity 
of the expected lifetime number of reproductive events, E (v41) = n41, with respect 
to the survival probabilities o1, ...,04, the maturation probability y2, and the 
breeding probability y3. Figure 5.2 shows that the number of breeding events is 
most elastic to mature female survival (o3), and less so to the survival of mature 
females or mothers (o2 and o4). Changes in the probability of giving birth, y3, have, 
remarkably enough, no impact on the expected number of reproductive events. 

The elasticity of n41 to 03 (survival of mature females) is approximately 30. This 
implies that a 1% increase in o3 would produce about a 30% increase in the expected 
number of reproductive events. 


5.3 From Stage to Age 


The fundamental matrix summarizes the age-specific information implicit in the 
transient matrix U, even if the model is stage-classified and age does not appear 
explicitly. We now extend this, to explore a series of age-specific demographic 
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indices and their sensitivity analyses. Some are well known (Rọ, generation time), 
others little explored (variance in longevity, for example). They can, however, all be 
easily calculated from any stage-classified model. 


5.3.1 Variance in Occupancy Time 


The occupancy time in any transient state is a random variable; the fundamental 
matrix N gives its mean. Some individuals will visit that state more often, some 
less often, some not at all. This basic property of individual stochasticity can be 
described by the variance of v;j. Iosifescu (1980), Theorem 3.1 gives a formula for 
all the moments of the v;j; from this we can calculate the matrix of variances 


V = (V(vij)) = (2Nag — I)N- NoN (5.18) 


(Caswell 2006) where o denotes the Hadamard, or element-by-element, product and 
Nag is a matrix with the diagonal elements of N on its diagonal and zeros elsewhere. 
The standard deviations of the occuancy times are the square roots of the elements 
of V. 


The right whale For the right whale, the matrix of variances calculated from 
(5.18) is 


0.00 0.00 0.00 0.00 0.00 
36.18 35.95 0.00 0.00 0.00 
V = | 466.44 484.80 503.32 494.86 503.32 |, (5.19) 
35.80 36.98 37.54 37.54 37.54 
33.28 34.94 37.54 37.54 37.54 


and the corresponding standard deviations are 


0.00 0.00 0.00 0.00 0.00 
6.02 6.00 0.00 0.00 0.00 
(SD(;;)) = | 21.60 22.02 22.43 22.25 22.43 |. (5.20) 
5.98 6.08 6.13 6.13 6.13 
5.77 5.91 6.13 6.13 6.13 


The variance in the v;; is the result of luck, not heterogeneity. That is, it is 
the variance among a group of individuals all experiencing exactly the same 
stage-specific transition and mortality probabilities in U. As such, it can provide 
a null model for studies of heterogeneity in quantities such as the number of 
reproductive events. This idea has been explored independently, and in more detail, 
by Tuljapurkar and colleagues (Tuljapurkar et al. 2009; Steiner and Tuljapurkar 
2012). 
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The sensitivity of the variance is derived in Appendix A.1 as 


dvec V 
aT = E (NT 2 1,) D (vec Is) + 2 (Is ® Nag) 
dvecN 
—I,2 — 2D (vec v| “aT (5.21) 


Elasticities of V are calculated using (2.55). 


Hint Before looking at Appendix A.1, to derive (5.21), write Nag = Io N, 
differentiate (5.18), and use the fact that vec (A o B) = D (vec A)vecB = 
D (vec B)vec A. 


The right whale The elasticities of V (v41), calculated from (5.21) and (5.17), are 
shown in Fig. 5.2b. They are roughly proportional to the elasticities of E (v41); that 
is, the vital rates that have large effects on the expected number of reproductive 
events also have large effects on the variance. 


5.3.2 Longevity and Life Expectancy 


Longevity is an important demographic characteristic (Carey 2003). Mean 
longevity, or life expectancy, it is one of the most widely reported demographic 
statistics, used to compare populations, species, countries, regions, historical 
periods, etc., and to examine the effects of evolutionary, management, medical, 
and social processes. The longevity of an individual is the sum of the time spent in 
all of the transient states before final absorption. Let the random variable n; denote 
the longevity of an individual currently in stage 7. Then 


ny =} vij. (5.22) 


A vector E (n) of expected longevities, or life expectancies, is obtained by summing 
the columns of N: 


E(y') =1'N (5.23) 
where 1 is a vector of ones. Often, life expectancy at birth is of primary interest. 
If stages are numbered so that birth corresponds to stage 1, then life expectancy at 
birth is 

E(n) = 1'Ne, (5.24) 


where e is a vector with 1 in the first entry and zeros elsewhere. 
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The sensitivity of life expectancy in age-classified models has been studied by 
Pollard (1982) and Keyfitz (1971); see Keyfitz and Caswell (2005, Section 4.3), 
Vaupel (1986), and Vaupel and Canudas Romo (2003). 

For more general stage-classified models, the sensitivity of E() is (Caswell 
2006) 


a be (1, 2 1") (NT 2 N) D (5.25) 


Hint To obtain (5.25), differentiate both sides of (5.23), apply the vec operator, and 
use (5.16) for the derivative of N. See Appendix A.2 for the derivation. 


The right whale For the right whale, the vector of life expectancies is 
EQ) = (32.0 34.3 35.2 31.8 36.2) (5.26) 


Because mortality rates vary relatively little among stages, the life expectancies of 
the stages differ by only about 15%. Thus life expectancy for a calf implied by these 
data was 32 years. The elasticities of life expectancy to the vital rates are shown in 
Fig. 5.3. Life expectancy is most elastic to mature female survival o3, and less so to 
o2 and o3. This partly reflects the longer amount of time spent as a mature female, 
compared to an immature female or mother; see (5.7). The elasticity to the birth rate 
y3 is negative, because of the reduced survival of mothers. A 1% increase in y3 will 
lead to a 0.51% decrease in life expectancy. This is one possible measure of the cost 
of reproduction. 
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Fig. 5.3 Elasticities of longevity for the right whale. (a) The elasticity, to each of the vital rates, 
of life expectancy for a female right whale calf. (b) The elasticity of the variance in longevity for 
a female right whale calf. Parameters as in Fig. 5.2 
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5.3.3 Variance in Longevity 


Like the occupancy time in a transient state, longevity is a random variable, the 
variability of which is a measure of individual stochasticity. Individuals differ in 
longevity depending on the pathways taken from birth to death. This variance has 
been explored by human demographers, using life table methods, as one way of 
studying the inequality in life span generated by a given mortality schedule, and how 
that inequality has changed over time (e.g., Wilmoth and Horiuchi 1999; Shkolnikov 
et al. 2003; Edwards and Tuljapurkar 2005; Van Raalte and Caswell 2013). 
The variance of the time to absorbtion is 


Vin") =1NQN-D-—E (n°) oF (n°) (5.27) 


(Caswell 2006; Iosifescu 1980). 
The sensitivity of the variance in longevity is 


a = 2 (NT 2 1") +2 (1, 2 1'N) 


z (1, @ 1") — 2D (E (n)) (1, 2 1") | (NT 2 N) T (5.28) 


The first entry of (5.28) is the sensitivity of the variance in longevity starting in 
stage 1. 


Hint To derive (5.28), differentiate (5.27) and apply the vec operator and Roth’s 
theorem to each term, using (5.25) for the derivative of E(ņ). See Sect. A.3 for 
details. 


The right whale For the right whale, the variance and standard deviation of 
longevity are given by 


V(m)' = (1157 1167 1172 1163 1172) (5.29) 
SD(q)" = (34.0 34.2 34.2 34.1 34.2) (5.30) 


The life expectancy at birth of 32 years has a standard deviation of about 34 years. 
Note that this result implies a very long positive tail of longevity. The interpretation 
of this result is tricky; I will return to it in Sect. 5.7. 

The elasticities of the variance of longevity of a calf are shown in Fig. 5.3b. The 
variance in longevity is increased by increases in 03, less so by increases in 02 and 
o4. The pattern of the elasticities is strikingly similar to that of the elasticities of 
E(). 
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5.3.4 Cohort Generation Time 


Generation time measures the typical age at which offspring are produced, or the 
age at which the typical offspring is produced. It appears in the IUCN criteria 
for classifying threatened species (IUCN Species Survival Commission 2001) 
as well as in various evolutionary considerations. There are several definitions 
of generation time (Coale 1972); here we will examine the cohort generation 
time, defined as the mean age of production of offspring in a cohort of newborn 
individuals. From the definition it is clear why calculation of generation time is a 
problem in stage-classified models, in which the age of parents does not appear. 
Moreover, in stage-classified models, individuals may be born into several stages 
(e.g., cleisthogamous vs. chasmogamous seeds; LeCorff and Horvitz 2005), each 
with a different subsequent pattern of development, survival, and fertility. There 
could be a different generation time for each type of offspring, and if individuals 
may produce more than one type of offspring, the average age at which they are 
produced could differ from one kind of offspring to another. 

Thus, we expect to have a generation time that measures the mean age of 
production of offspring of type i by an individual born in stage j. Write this as 
a vector The) ). Then it can be shown (Sect. A.5) that 


uw) =D (ENe;) | FNUNe; (5.31) 


The sensitivity of uV? is obtained by a methodical application of matrix calculus 
to (5.31). To simplify notation, define 


X = D (FNe;) (5.32) 
r = FNUNe; (5.33) 


The resulting sensitivity of uV? is 


dp) _ T -1 -1 
T =-(r 81) (x 9X ) D vecD 
TAT dvec F dvec N 
x| (eN 81) T t (lei @F) — | 
T dvec F T dvec N 
+ | [evune,) 81] a + [(UNe;)" @ F] T 
dvec U dvec N 


+[(Ne;)" @ FN] at [ef @ FNU] | (5.34) 


do" 
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Fig. 5.4 The elasticity, to Generation time 
each of the vital rates, of the 
cohort generation time for a 30 
newborn calf right whale. 
Parameters as in Fig. 5.2 
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Hint To derive (5.34), it helps to note that, for any vector z, one can write D (z) = 
Tozl!. Apply this to X, differentiate all the terms in .“/), and apply the vec operator. 
With any luck, you will come out to this answer. See Sect. A.5.1 for derivation. 


The right whale The elasticities of the generation time w“!) of a calf are shown in 
Fig. 5.4. Changes in early survival (0) and 02) have little effect. Adult survival o3 
and, to a lesser extent, o4 increase the generation time by extending the reproductive 
lifespan. The maturation probability y2 and the birth probability y3 have negative 
effects on generation time, because they speed up reproduction. 


5.4 The Net Reproductive Rate 


In age-classified demography, the net reproductive rate Ro measures lifetime 
reproductive output. It also appears in epidemiology, where it measures the potential 
of a disease to spread (e.g., Diekmann et al. 1990; van den Driessche and Watmough 
2002). The classical net reproductive rate satisfies three conditions: 


Cı: Ro measures the expected lifetime production of offspring. 

C2: Ro measures the rate of increase per generation (in contrast to the rate of 
increase per unit of time, which is given by A or r). 

C3: Ro is an indicator function for population persistence. If Ro > 1 then an 
individual will, on average, produce more than enough offspring to replace 
itself, the next generation will be larger than the present generation, and the 
population will grow. If Ro < 1, each generation is smaller than the one 
before, and the population will decline to extinction. 


In classical demography (Lotka 1939; Rhodes 1940), 


Ro = i. L(x)m(x)dx (5.35) 
0 
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where £(x) is survivorship to age x and m(x) is the maternity function. It is not 
difficult to show that Ro defined in this way satisfies conditions C1, C2, and C3. 

In stage-classified models, however, the calculation of Ro must account for the 
multiple pathways that an individual may follow through the life cycle, and the pro- 
duction of multiple kinds of offspring along each of these pathways. Rogers (1974; 
see also Lebreton 1996) considered Ro in the context of an age-classified population 
distributed across a set of spatial regions. However, these calculations assume that 
age-specific survival and fertility schedules are available for each region. A more 
general solution was provided by Cushing and Zhou (1994) for stage-classified 
populations with no age-specific information. Their analysis produces an index that 
satisfies as many as possible of the conditions C1, C2, and C3. de Camino-Beck and 
Lewis (2007, 2008) have derived graph-theoretic ways to calculate Ro. 

Consider an initial cohort at £ = O with structure xg, and call this the first 
generation. This cohort will produce offspring according to Fxg. The survivors of 
the cohort at tf = 1 will produce offspring according to FUxo. The survivors at 
t = 2 will produce offspring FU7xo, and so on. The second generation is composed 
of all the offspring of the first generation, obtained by summing over the lifetime of 
the cohort 


x(1) = ($v) Xo 


i=0 
= (FN) xo (5.36) 
Iterating this process leads to a model for the growth from one generation to the next 
x(k + 1) = FNx(k) (5.37) 


Cushing and Zhou (1994) define Ro as the per-generation growth rate, given by the 
dominant eigenvalue p of FN, 


Ro = p[FN] (5.38) 


Thus the Cushing-Zhou measure of Ro clearly satisfies condition C2. Cushing and 
Zhou (1994) also prove (their Theorem 3) that Ro defined in this way is less than, 
equal to, or greater than 1 if and only if A is less than, equal to, or greater than one, 
respectively, thus satisfying condition C3. 

The relation between lifetime offspring production and Ro (condition C1) is more 
complicated when the life cycle contains multiple types of offspring. If only a single 
type of offspring is produced (call it stage 1), then F will have nonzero entries only in 
its first row, and FN will be upper triangular, with its dominant eigenvalue appearing 
in the (1, 1) position. i.e., the sum of the fertilities of each stage weighted by the 
expected time spent in that stage. This is precisely the expected lifetime offspring 
production, so for the case of a single type of offspring, the Cushing-Zhou Ro also 
satisfies C1. 
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However, if the life cycle contains multiple types of offspring (say stages 


1,...,h), the upper left h x h corner of FN will contain the expected lifetime 
production of offspring of types 1,...,/ by individuals starting life as types 
1,...,h. Since such a life cycle contains more than one kind of expected lifetime 


production of offspring, Ro cannot satisfy Cı in the sense of being the expected 
lifetime reproduction. Instead, Ro is calculated from all these expectations (as the 
dominant eigenvalue of this h x h submatrix). It determines per-generation growth 
and population persistence as a function of the expected lifetime production of all 
types of offspring in a way that satisfies C2 and C3. 


The right whale The right whale produces only a single type of offspring. The 
fundamental matrix N is given by (5.7), the fertility matrix is given by (5.5), and the 
generation growth matrix is 


2.18 2.42 3.06 2.60 3.06 
0 0 0 0 O 


FN = 0 0 0 0 O (5.39) 
0 0 0 0 O 
0 0 0 0 O 
The dominant eigenvalue of FN is its (1, 1) entry 
Ro = X figE@ji) = 2.18 (5.40) 
j 


It is interesting to compare Ro = 2.18 with E (v14) = 4.74. Only female offspring 
are counted in Ro, whereas E (v14) counts reproductive events regardless of the sex 
of the offspring produced. Still, Ro is less than half of E (v14), because of the less 
than perfect survival of calves from ż to t + 1. 


5.4.1 Net Reproductive Rate in Periodic Environments 


Periodic time-varying models (Caswell 2001, Chapter 13) are an interesting special 
case of the multiple offspring type problem. In a periodic model, apparently 
identical offspring (e.g., seeds) produced at different phases of the cycle (e.g., 
seasons) are, in effect, of different types of. To the extent that they face different 
environments, they will differ in their expected offspring production, and Ro will 
differ depending on the phase of the cycle in which it is calculated. 

The net reproductive rate in a periodic environment was calculated by Hunter 
and Caswell (2005a) in a study of the sooty shearwater, a pelagic seabird nesting 
on offshore islands in New Zealand. In that study, the year was divided into two 
short phases, during which breeding and harvest of chicks occur, and a longer phase 
encompassing the rest of the year. Let B; = U; +F; be the projection matrix in phase 
i of the cycle. Without loss of generality, consider an environment with a period of 2 


84 5 Individual Stochasticity and Implicit Age Dependence 


(e.g., Winter and summer). The population is projected over a year, starting in phase 
1, by 
Ay; = BoB, (5.41) 


which is decomposed as 
A; = (U2 + F2) (U; + Fi) 
= UU, + UF, + FoU; + FoF) (5.42) 


The first term includes only transitions, whereas the last three terms all describe 
some aspect of reproduction. Thus the annual matrix is A; = U + F, where 


U, = WU, (5.43) 
F, = WF, + FU; + FF; (5.44) 
and 
(1) < Sl 
RP =p |f 1-01) '] (5.45) 


where the superscript 1 indicates that this is the net reproductive rate of a generation 
beginning in season 1. The corresponding matrices for a generation starting in 
season 2 are obtained from 


A> = B1 Bo (5.46) 


and lead to a net reproductive rate RO . It is easily verified that RP Æ RO in 
general. This contrasts with the population growth rate à, which is independent of 
cyclic permutation of the seasons. However, since À is the same for A; and Ag, it 
must be the case that RP and RP are both greater than or less than 1 together. 

An alternative formulation of Rọ in periodic environments was published at the 
same time as Caswell (2009), by Bacaër (2009). He wrote the model, using methods 
equivalent to those in Sect. 5.5 below, by jointly classifying individuals by stage and 
by their phase within a seasonal cycle. Let A; = U; + F; be the projection matrix 
in season i. Then, for example with three seasons, the projection matrix would take 
the block-circulant form 


. 0 0 A; 
A=1{A,; 0 0 (5.47) 
0 Ay 0 


(with similar formulations for U and F). After some manipulations, Bacaër shows 
that Ro is the dominant eigenvalue of the matrix? 


3It might be easier to apply the Cushing-Zhou theorem directly to A and write 


Ro = o(F (1-0) ') (5.48) 


but Bacaér does not do this. 
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—1 


F 0 0 —U,; I 0 
0 F, 0 0 -U I ; (5.49) 
0 0 F; I 0 =U; 


Bacaér (2009) proves that Ro calculated in this way satisfies condition C3, providing 
an indicator for population growth (Rọ > 1) or decline (Ro < 1). However, this 
definition of Ro does not satisfy Cı because it does not distinguish the different 
lifetime reproductive output of individuals born in different seasons. 

Cushing and Ackleh (2012) returned to this issue. They argue that the standard 
approach for studying dynamics of periodic models is to study the “periodic 
composite map”, which is the map for the entire cycle composed of the product 
of the phase-specific matrices, as in (5.41), which projects over the entire cycle, 
rather than from one season to the next. They separate transitions and reproduction 
as in Eqs. (5.43) and (5.44), and prove that Ro calculated in this way satisfies C1 
(with a different lifetime reproductive output for each starting season) and C3 (so 
that the values of Ro in each season agree in their determination of positive or 
negative growth). Cushing and Ackleh (2012) also explore the net reproductive rate 
in nonlinear models, in which Rọ calculated at zero density determines whether the 
extinction equilibrium is stable. 

In the end, it is valuable to have two different ways of calculating Ro, but it 
highlights the need to carefully specify which properties one wants the index to 
have. 


5.4.2 Sensitivity of the Net Reproductive Rate 


Since Ro is obtained as an eigenvalue, its sensitivity to parameter changes is easy 
to derive. Let x and y be the right and left eigenvectors of FN corresponding to Ro. 
Then (Caswell 2006) the sensitivity of Ro is 


dRo a TaT T dvec F TaT T dvec U 
vat = YN gx") ST +N ® xTFN]) T (5.50) 


The first term captures the effects of changing fertility, the second term captures 
effects of changes in survival and transitions. The derivation of (5.50) is given in 
Appendix A.4. 


Hint To derive (5.50), write Ro = p[FN] and write d Ro in terms of the right and 
left eigenvectors of FN and the differential of FN. Then expand d(FN) = (dF)N + 
Fd(N) and apply the vec operator and the chain rule. 


The right whale The elasticity of Ro is shown in Fig. 5.5; Rọ is most elastic to 
03, less so to o2 and o4. Remarkably, the elasticity of Ro to the birth probability 
ys is zero (actually, ~ 107°). This is a case where lifetime reproductive output is 
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Fig. 5.5 The elasticity, to Net reproductive rate 
each of the vital rates, of the 
net reproductive rate (Ro) for 30 i ' : ' ' ' 
the right whale. Parameters as 
in Fig. 5.2 
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affected strongly by survival, slightly by maturation, but not at all by the probability 
of breeding given survival. This seems to be a consequence of the lower survival 
probability of mothers; an increase in y3 increases the probability of reproduction, 
but reduces the lifetime over which that reproduction will be realized. 


5.4.3 Invasion Exponents, Selection Gradients, and Ro 


Selection on life history traits can be studied in terms of the invasion exponent, 
which measures the rate at which a mutation, introduced at low densities, will 
increase in the environment created by a resident phenotype (Metz et al. 1992; 
Ferriére and Gatto 1993); for a recent introduction see Otto and Day (2007). The 
selection gradient on a trait is the derivative of the invasion exponent with respect to 
the value of the trait. If the derivative is positive, selection favors an increase in the 
trait, and vice-versa. The invasion exponent in a density-independent model is given 
by log à. In a density-dependent model, the invasion exponent is given by the growth 
rate at equilibrium, A[n]. The net reproductive rate Ro is not, strictly speaking, 
an invasion exponent, but because it measures expected lifetime reproduction, it is 
attractive as a measure of fitness (see, e.g., the discussion in Kozlowski 1999). Using 
Ro as a measure of fitness will lead to erroneous conclusions unless the selection 
gradients, measured in terms of A and of Ro, give the same answers, i.e., unless 
dRo/dé œx dlogir/dé. 

For an age-classified model, we write Ro in terms of the net maternity function 
d(x, 0) = £(x, 6)m(x, 0) where both survival and reproduction depend on some 
parameter 0. Then 


Ro(@) = f oa oax (5.51) 
0 
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The growth rate r = log À is the solution to 
CO 
l= i d(x, Oe! O* dx (5.52) 
0 


Differentiating (5.51) and (5.52) gives 


dRo © d(x, 6) 
as paisa ea | f 
m f 7 a (5.53) 
—rx dob(x,0 
ar _ SnAg Pda = 
do ` 


ie” x(x, 0)e"* dx 


Equation (5.54) is Hamilton’s (1966) famous result; the denominator is the genera- 
tion time measured as the average age of reproduction in the stable age distribution 
(see Chap. 3). 

When Rọ = 1 andr = Q, it follows from (5.53) and (5.54) that the gradients 
dr/d@ and dR /d@ are proportional. Use of either will lead to the same conclusions 
about selection. But when r ¥ O, this is not the case. If r > 0, then dr/d@ is 
reduced for traits that operate at later ages, because dø /dx is weighted by e~’*. It 
is an open problem to generalize this result to stage-classified models, and prove 
that 


dlogà dRo 
xX <<. 
do’ ~— do" 


(5.55) 


when A = Ro = 1. Ina few cases I have examined, it appears to be true numerically. 
As the following example shows, it is certainly the case that when à # 1, the 
derivatives are not generally proportional. 


The right whale The lack of proportionality between the selection gradients in 
terms of à and of Ro means that evolutionary conclusions will differ depending 
on which is used, especially when tradeoffs exist between two or more traits. 
For example, for the right whale, A = 1.025 and Ro = 2.183. Figure 5.6 
shows the sensitivity of A and of Ro; while the patterns are similar, they are not 
proportional, and the use of Ro as an invasion exponent would result in erroneous 
predictions. Suppose a trait existed that would increase the birth probability y3 
at the cost of a reduction in calf survival o}, with the cost measured by c = 
—do\/dy3. An increase in this trait would be favored by selection provided 
that 


dA/OY3 
dA/do, 


(5.56) 
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Fig. 5.6 (a) The sensitivity, to each of the vital rates, of the net reproductive rate Ro for the right 
whale. (b) The sensitivity of population growth rate à. The derivative of A is the selection gradient; 
use of the derivative of Ro leads to erroneous predictions unless the population is at equilibrium. 
Parameters as in Fig. 5.2 


But if expected lifetime reproduction was used as an invasion exponent, the analysis 
would conclude that selection would favor an increase in the trait only if 


ƏRo/ðy3 _ 


———_ = (5.57) 
ə Ro/301 


That is, according to Ro, any cost whatsoever of increased birth rate would 
prevent selection from favoring it. According to A (and correctly, in this case), 
selection would favor increased birth rate provided that the cost was not too great. 
In spite of the superficial similarity of the patterns in Fig.5.6, the evolutionary 
implications are quite different, reflecting the impact of timing of life history 
events on A. The sensitivities of à to o2 and y2, which influence early survival 
and the age at maturity, are larger than the sensitivities of Ro to the same 
parameters. 


5.4.4 Beyond Ro: Individual Stochasticity in Lifetime 
Reproduction 


Variation among individuals is fundamental to population biology. As argued 
here, two sources of variation must be distinguished: heterogeneity and individ- 
ual stochasticity Heterogeneity refers to genuine differences among individuals, 
because of which the individuals experience different vital rates. Individual stochas- 
ticity refers to the apparent differences that result from the random outcome of 
identical vital rates, applied to identical individuals. We have seen above that 
individual stochasticity is always present. That is particularly true of lifetime 
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reproductive output (LRO). The net reproductive rate is the expectation of LRO, 
but what can we say about the variance among individuals. 

Empirical measurement shows that LRO is usually highly variable among 
individuals and positively skewed. Typically, a few individuals produce many 
offspring while most produce few, or none at all (Clutton-Brock 1988; Newton 
1989). If this variance reflected heterogeneity among individual properties, and 
if the heterogeneity had a genetic basis, the variance would provide material for 
natural selection (the “opportunity for selection” of Crow 1958). Population and 
quantitative genetics are replete with methods to measure such genetic variation; 
e.g., Lande and Arnold (1983) and Endler (1986). 

However, variance among individuals in LRO is not evidence of heterogeneity, 
genetic or otherwise; some is due to individual stochasticity. Only after evaluating 
the extent of individual stochasticity can data on LRO be interpreted as evidence 
for heterogeneity (Caswell 2011; Tuljapurkar et al. 2009; Steiner et al. 2010; 
Steiner and Tuljapurkar 2012). Caswell (2011) developed a method to calculate 
the mean, variance, and higher moments of lifetime reproductive output for any 
age- or stage-classified life cycle, using Markov chains with rewards; see van 
Daalen and Caswell (2015, 2017) for full details. In these models* the movement of 
the individual through its life cycle is described by an absorbing Markov chain; 
mortality appears as transitions to an absorbing (dead) state. At each step, the 
individual accumulates a “reward.” In our context, the reward is the production 
of offspring. The reproductive reward is a random variable with a specified set 
of moments. The reward accumulated by the inevitable death of the individual is 
its LRO. Although every individual experiences the same vital rates—there is no 
heterogeneity—each individual may experience a different life and thus a different 
lifetime reproductive output. 

Stage-specific reproductive output is specified by a set of reward matrices Rx. 
The (i, j) element of Rz is the kth moment of the reproductive output associated 
with the transition from stage j to stage i. Given the reward matrices, the Markov 
chain transition matrix P, and the reasonable assumption that the dead do not 
reproduce, all the moments of LRO can be calculated (van Daalen and Caswell 
2017). 

Let p, be a vector containing the kth moments of LRO for individuals starting 
in each transient (living) stage. Then, it has been shown (van Daalen and Caswell 
2017) that, e.g., the first two moments of LRO are 


pi = NTZ PoR)! hy 44 (5.58) 


py =N' [z@ o Ro) "1,41 +2(U 0 RA, (5.59) 


4Markov chains with rewards have a long history in stochastic process theory; see Howard (1960), 
Puterman (1994), and Sheskin (2010). 
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where s is the number of stages in the life cycle, o denotes the Hadamard product, 
N= (1- U)! is the fundamental matrix of the Markov chain and Z is a matrix that 
selects the living states. From these moment vectors we can calculate all the statistics 
of LRO. In addition, the full sensitivity analysis, calculating the derivatives of any 
of the moments of LRO to any parameters affecting any of the transition, mortality, 
or reward matrices, has been presented by van Daalen and Caswell (2017). 

One of the most significant findings of this line of research has been that, in many 
cases, individual stochasticity can account for most or all of the observed phenotypic 
variance in LRO (Steiner and Tuljapurkar 2012; van Daalen and Caswell 2017). It 
appears that the contribution of stochasticity to variance in lifetime reproductive 
output has been underappreciated. 


5.5 Variable and Stochastic Environments 


The variance due to individual stochasticity can be examined in the case of 
variable environments (Caswell 2006; Tuljapurkar and Horvitz 2006; Horvitz and 
Tuljapurkar 2008; see also Chap. 8). Several cases can be considered: 


e Deterministic aperiodic environments. These usually appear as specific historical 
sequences; e.g., the specific sequence of vital rates exhibited by the right whale 
between 1980 and 1998 (Caswell 2006). That sequence is fixed, and is neither 
random nor periodic. 

e Periodic environments. A periodic model may describe seasonal variation within 
a year, or may approximate inter-annual variability in events such as floods, fires, 
or hurricanes. 

e Stochastic iid environments. In such environments, successive states are drawn 
independently from a fixed probability distribution; hence the identifier iid, short 
for “independent and identically distributed.” 

e Markovian stochastic environments. In a Markovian environment the probability 
distribution of the next environmental state may depend on the current state. 
This permits study of the effects of environmental autocorrelation. Markovian 
environments include periodic and iid environments as special cases. 


See Tuljapurkar (1990) for a thorough discussion of types of stochastic environ- 
ments. 

When studying variable environments, it is important to distinguish period and 
cohort calculations. Period calculations are based on the vital rates in a given year. 
They describe the results of the hypothetical situation where the conditions of year 
t are maintained indefinitely, and compare those to the results for conditions in 
year t + 1, etc. Period calculations are a way to summarize the effects of changing 
environment. But an individual born in year ¢ does not live its life under the 
conditions of year t. It spends its first year of life under the conditions in year t, 
its second year under the conditions of year t + 1, and so on. Results calculated 
in this way are called cohort calculations, because they describe a cohort born in 
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year t and living through the environmental sequence starting then. Period-specific 
calculations are easy; simply apply the time-invariant calculation to the vital rates 
of each year and tabulate the results. Cohort calculations, however, must account for 
all the possible environmental sequences through which a cohort may pass. Caswell 
(2006) and Tuljapurkar and Horvitz (2006) independently introduced two different, 
complementary approaches to doing so. I will present the former approach here. 


5.5.1 A Model for Variable Environments 


In a variable environment, the transient matrix U is a time-varying matrix U(t). We 
can define a fundamental matrix by 


N =I+ UO) + UC)U(Q) + U(2)UC)U(O) + - -- (5.60) 


The (i, j) element of N is the expected occupancy time in transient state i by 
an individual starting in transient state j at time 0, and experiencing the specific 
sequence of environments U(0), U(1), .... Thus there will be a different matrix N 
for each possible environmental sequence. 

Tuljapurkar and Horvitz (2006), whose paper I highly recommend, work directly 
from (5.60) to develop the means and variances of N, 7, and survivorship, in 
periodic, iid, and Markovian environments. Here, we consider an approach in 
which an individual is jointly classified by stage and environment, using the vec- 
permutation model developed by Hunter and Caswell (2005b). 

Suppose that there are q environmental states € = 1,...,g and s stages, g = 
1,..., s. Corresponding to environment i is a s x s transient matrix U;. Assemble 
the matrices U; into a block-diagonal matrix 


U; 
U = e (5.61) 


of dimension sq x sq. 

The transitions among environmental states are defined by a q x q column- 
stochastic matrix D. Use the matrix D to construct a block-diagonal environmental 
transition matrix 


0D- 0 
D = l (5.62) 


of dimension sq x sq. 
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Suppose that there are 4 environmental states. In an aperiodic deterministic 
environment, 


0000 
1000 

D= (5.63) 
0100 


0011 


That is, the environment moves deterministically from state 1 to state 2 to state 3 to 
state 4. Setting d44 = 1 solves the problem of what to do at the end of the sequence, 
by the (possibly satisfactory) trick of letting the final state repeat indefinitely. In a 
periodic environment, 


0001 
1000 

D= (5.64) 
0100 


0010 


In an iid environment in which environment i occurs with probability 7;, 


| | Wy MI 
2 M2 M2 M2 

D= (5.65) 
3 3 M3 T3 


T4 T4 N4 N4 
In a Markovian environment, D is a column stochastic transition matrix describing 
the transition probabilities. I will assume that the environmental Markov chain is 
ergodic, with a stationary probability distribution denoted by x. This gives the long- 
term frequency of occurrence of each environmental state. 

The state of the cohort can be specified by a matrix X, of dimension s x q, with 


rows corresponding to stages and columns to environments, and where x;; (t) is the 
expected number of individuals in stage i and environmental state j at time t. 


X11 ee Xlq 
XO— 1 : : (5.66) 


Xsl ++ Xsq 


We rearrange X into a vector by applying the vec operator to X., 


veoX™ = E xig|-++[eat +++ seq) (5.67) 
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The first block of entries gives stage | individuals in environments | through g. The 
second block gives stage 2 individuals in environments | through q, and so on. 

To describe the dynamics of the cohort, suppose that individuals first move 
among stages, according to the vital rates determined by the current environment, 
and then the environment changes to a new state according to D. Then 


vec "X(t + 1) = D Ks 4 U K}, vec X(t) (5.68) 


The matrix Ky, is the vec-permutation matrix (Henderson and Searle 1981; Hunter 
and Caswell 2005b), commutation matrix (Magnus and Neudecker 1979), which 
permutes the entries of a vector so that 


vec X = K, gvec X (5.69) 


(see Sect. 2.2.3). Like all permutation matrices, its transpose is also its inverse. 
Its role here is to rearrange the population vector into a form appropriate for 
multiplication by the block-diagonal matrices B and D. 

Working from right to left, (5.68) first rearranges the vector, then applies the 
block-transition matrix U, then reverses the rearrangement of the vector, and finally 
applies the environmental transition block matrix D to obtain the expected cohort at 
t + 1. This gives a transition matrix for the joint process, 


U=DK,, UK], (5.70) 


that incorporates the demographic transitions within each environment and the 
patterns of time variation among environments.° Here and in what follows, the tilde 
distinguishes the matrix from the environment-specific matrices. 

Matrices of similar form, but not using this formalism, were introduced by 
Horvitz to study populations in habitat patches where the habitat patches change 
state over time, for example in recovering from disturbance (Horvitz and Schemske 
1986; Pascarella and Horvitz 1998). Horvitz introduced the term “megamatrix” to 
describe these models. A megamatrix, in the sense of Horvitz, is a special case 
of (5.70) when the population is classified by stages within environmental states, the 
demographic matrices are applied first, and the environmental transition matrices D; 
are identical for all stages, as is the case in (5.62). 


Note that (5.68) computes the expected population at t + 1 from the expected population at t. 
It might be tempting to do this with the projection matrix A and use the eigenvalues of A to 
calculate the stochastic population growth rate. However, this would give the growth rate of the 
mean population, but not the stochastic growth rate (which is always less than or equal to the 
growth rate of the mean population). For calculations such as moments of longevity, which are 
explicitly properties of the expected population, the difference does not arise. 
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5.5.2 The Fundamental Matrix 


Since U is the transient matrix of an absorbing Markov chain, the fundamental 
matrix in the time-varying environment is 


Ñ = (Ip —U)' (5.71) 


The elements of N give the expected occupancy times in each stage, in each 
environment, as a function of the starting stage and starting environment. 


Notation alert Developing a complete system of notation for N would obscure 
more than it would clarify. Pictures can help. As I present the fundamental matrix 
and some of the properties calculated from it, I will use diagrams for a simple 
case with three stages and two environments. I will often indicate the dimension of 
matrices and vectors with subscripts. I will use g to denote stages (g = 1,2,..., 5) 
and € to denote environments (€ = 1,...,q). I will use superscripts on N and 
quantities derived from it, to distinguish different ways of combining information 
across environmental states (see Table 5.1). 

Recall that in a constant environment, v;; was the number of visits to stage i, 
starting in stage j. Now we must consider the visits to stage i in environment €, 
starting in stage j and environment €o, so we write 


N = E (vij,ele0) (5.72) 


Table 5.1 Superscript notation for time-varying models. The tilde indicates quantities calculated 
from the complete transient matrix U in (5.70). Occupancy and times to absorbtion depend on the 
initial and final demographic and environmental states. The superscripts (4, §, O) indicate choices 
of summing and averaging over the environmental states. The superscripts are shown here for the 
fundamental matrix N 


Symbol | Definition | Description Equation 

N E (vij,<l€o) Expected visits to state i in environment €, starting from | (5.72) 
state j in environment €o 

NG E (vij leo) Expected visits to state i, summed over environments, start- | (5.73) 
ing from state j in environment €o 

N+ Rearrangement of the rows and columns of Nt (5.74) 

NS E (vij,e) Expected visits to state i and environmental state €, averaged | (5.75) 
over initial environmental states 

NSS Rearrangement of the rows and columns of NS (5.76) 

NY E (vj i) Expected visits to state i summed over environments, start- | (5.77) 


ing from state j and averaged over initial environmental 
states 
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The structure of N when s = 3 and q =2is 


€0 = l €o = 2 €o = l éo = 2 éo = 1 éo = 2 
g=le=1 
~ e=2 
Nsgxsq : g=2e=1 
ce=2 
g=3e=1 
€=2 


From N we can obtain the expected occupancy time in each stage, regardless of the 
environment in which those visits occur, by aggregating rows. The resulting matrix 
NŻ is 
Ni =E (vijleo) 
= (1, @1),.)N (5.73) 


where 1,1 is a vector of ones. The structure of Ni is 


gal g=2 g=3 
s co = 1 éo = 2 éo = 1 €o = 2 éo = l eo = 2 
N; xsq g=l 
g= 
g=3 


If it is useful to group stages within initial environments, rather than grouping 
environments within stages, N* can be rearranged as 


NH =Ñ Ksq (5.74) 
with the structure 
€)9 =l €9 = 2 
g=lg=2g=3g=lg=2g=3 
Noe g=l 
8 = 
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The matrices N? and N*? both display expected occupancy of each stage as a 
function of initial state and environment. To describe the fates of individuals without 
specifying their initial environment, we take an expectation over the stationary 
distribution x of initial environments. This gives 


NS = E [vije] 
=N(, 8x) (5.75) 
The structure of Ñ$ is 
€0 = € 
galg=2g= 
g=1 e=1 
‘St e=2 
Nsqxs u g=2 c= 1 
€=2 
g=3 €=1 
€=2 


The rows of Ñ$ can be rearranged to display stages within environments, giving 
NSS = K], NS (5.76) 


with the structure 


Finally, aggregating over destination environments and averaging over initial 
environments gives a matrix containing the expected occupancy of stages as a 
function of initial stage, averaged over environments 


N° =E [vi] 


= (Lo) Ñ G@x) (5.77) 
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The structure of N° is 


ecg = E 
Ee g=lg=2g= 
Nie g=1 
g = 
g=3 


The matrix N°, obtained by the simple calculation (5.77), is “the” fundamental 
matrix for the variable environment. It could be compared directly to the fundamen- 
tal matrix in a constant environment (e.g., the environment defined by one of the 
environmental states). 


5.5.3 Longevity in a Variable Environment 


Life expectancy, as a function of initial stage and initial environment is obtained by 


summing the columns of N, 
E (a) = E [nleo] 


= 1 xÑ (5.78) 
The structure of E (7) is 
g=l g=2 g8=3 


E (7"): €o = 1 €o = 2 éo = 1 €o = 2 €o = l eo = 2 


Averaging this conditional life expectancy over the stationary distribution m of 
initial environments gives 


E(P)=E4 81) (5.79) 
This measure of life expectancy in a variable environment is directly comparable to 
E () calculated from the same life history in a constant environment. 
5.5.3.1 Variance in Longevity 


In a constant environment, the variance among individuals in longevity is due to 
individual stochasticity. In a time-varying environment, the variance contains an 
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additional component due to differences among individuals as a function of their 
environment at birth. Applying (5.27) to N we obtain the variances conditional on 
the initial environment: 


V [i leo] -E (i) QN 1,4) —E (i) oF (i) (5.80) 


As indicated by the notation, V fn" leo] is a conditional variance of 4, given 
the initial environment €ọ. The initial environment is distributed according to the 
stationary distribution x, so the unconditional longevity 7 follows a finite mixture 
distribution with mixing distribution z. 

The unconditional variance of 9, taking account of both sources of variability, is 


v [aT |= Vv [EG eo)| + Ex [VG eo) (5.81) 


where E, denotes the expectation over the stationary distribution æ of initial 
environments (Rényi 1970, p. 275, Theorem 1). This can be rearranged as 


vf] = #09] r [aJe e [at] + e [vao 


=[2()-2(@] won le) 92) 
+v [ico] d, ® x) (5.82) 


(e.g., Friihwirth-Schnatter 2006, p. 10). This variance decomposition has developed 
into a powerful tool for the analysis of heterogeneity in demography (Edwards 2011; 
Hartemink and Caswell 2018; Hartemink et al. 2017; Caswell et al. 2018; Jenouvrier 
et al. 2018). 

The choice of the mixing distribution is important. Hernandez-Suarez et al. 
(2012) present an alternative where x is the stationary distribution of births across 
environments, rather than the distribution of environments itself. 


5.5.4 A Time-Varying Example: Lomatium bradshawii 


Lomatium bradshawii is an endangered herbaceous perennial plant, found in only 
a few isolated populations in prairies of Oregon and Washington. These habitats 
were, until recent times, subject to natural and anthropogenic fires, to which L. 
bradshawii seems to have adapted. Fall-season fires increase plant size and seedling 
recruitment, but the effect fades within a few years. Populations in burned areas have 
higher growth rates and lower probabilities of extinction than unburned populations 
(Caswell and Kaye 2001). 
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A stochastic demographic model for L. bradshawii was developed by Caswell 
and Kaye (2001), Kaye et al. (2001), and Kaye and Pyke (2003) based on data from 
an experimental study using controlled burning. Individuals were classified into six 
stages based on size and reproductive status: yearlings, small and large vegetative 
plants, and small, medium, and large reproductive plants. The environment was 
classified into four states defined by fire history: the year of a fire and 1, 2, and 
3+ years post-fire. Projection matrices were estimated in each environment; the 
example here is based on one of the two sites (Rose Prairie) in the original study. 
The matrices are given in Caswell and Kaye (2001). 

L. bradshawii performs well under recently burned conditions, but less well in 
sites that have not been recently burned. For example, the values of A are 


Years since fire: 0 1 2 >3 
Growth rate à: 1.18 1.12 0.48 0.88 


Caswell and Kaye (2001) found a minimum frequency of fire (0.4-0.5) below which 
the stochastic growth rate was negative and the population would be unable to 
persist. Effects of autocorrelation were small, but positive autocorrelation reduced 
the stochastic growth rate. 

As an example of a time-varying analysis, let us examine L. bradshawii in 
a Markovian environment. Let f be the long-term frequency of fire, and p the 
temporal autocorrelation. Then the transition matrix for environmental states is 


P q q q 
l-p 0 0 0 

0 1l-q 0 0 

0 0 l-ql-gq 


D= (5.83) 


where q = f(1—p)andp=p+q. 

Figure 5.7a shows the life expectancy E (leo) of L. bradshawii as a function of 
initial stage and initial environmental state, from (5.78). Life expectancy increases 
with the stage (size) of a plant. A seedling has its greatest life expectancy in the 
year of a fire, less in an environment three or more years post-fire. A large flowering 
plant, in contrast, has its greatest life expectancy in an environment three or more 
years post-fire. When the environment-dependence is averaged over the stationary 
distribution of environmental states, there is a smooth increase in life expectancy 
from ~2.5 years for a seedling to 8 years for a large flowering plant (Fig. 5.7b). The 
standard deviation of longevity also increases with stage, in a pattern very similar to 
that of the expectation. 

These patterns in the mean and variance of longevity (Fig.5.7) depend on 
the stochastic properties of the environment—in this case, the frequency f and 
autocorrelation p of fires. Even with an environmental model this simple, the effects 
of f and p can be complicated. I know of no previous attempts to examine their 
effects on longevity. To do so, I calculated life expectancy with f = 0.5 for 
autocorrelation —1 < p < 1, and with p = 0 for fire frequency 0 < f < 1. 
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Fig. 5.7 The expectation and standard deviation of longevity for Lomatium bradshawii in a 
stochastic fire environment. (a) Expected longevity conditional on initial environment (€0). (b) 
Expected longevity averaged over the stationary distribution of initial environments. (c) The 
standard deviation of longevity conditional on initial environment. (d) The standard deviation of 
longevity over the stationary distribution of initial environments. The frequency of fire is 0.5 and 
the temporal autocorrelation p = 0.7 


The life expectancy of early life cycle stages increases monotonically with fire 
frequency (Fig. 5.8a), but the life expectancy of large reproductive plants is greatest 
at either low or high fire frequencies. The standard deviation of longevity increases 
with f (Fig.5.8b). As f — 1, the standard deviation of longevity is approximately 
twice the mean. 

The autocorrelation of fires has little effect on the life expectancy of seedlings, 
but a larger effect on that of large plants. For the latter, life expectancy is maximized 
as p —> —1 (alternating fire and non-fire years) or as op — 1 (long periods of fires 
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Fig. 5.8 The expectation and standard deviation of longevity, averaged over the stationary 
distribution of initial environments, for Lomatium bradshawii, as a function of the initial stage, the 
fire frequency f, and the temporal autocorrelation p. Parameters as in Fig. 5.7. (a) Life expectancy 
7° . (b) Standard deviation of longevity. (c) Life expectancy 7°. (d) Standard deviation of longevity 


alternating with long periods without fire). The standard deviation of longevity also 
shows a strong U-shaped response to p for all stages. The generality of this pattern 
is unknown. 


5.6 The Importance of Individual Stochasticity 


The concept of individual stochasticity strikes to the heart of one of the most 
fundamental problems in population biology: the sources of variability among indi- 
viduals. Heterogeneity—genuine differences among individuals—translates into 
differences in the age- or stage-specific vital rates to which they are subject. 
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Heterogeneity may arise from genetics, from physiological effects, from health 
conditions, or from unknown causes (“frailty,’ “quality”). Stochasticity results 
from the random outcomes of probabilistic processes. Markov chains naturally 
treat individual trajectories (i.e., individual lives) as realizations of an underlying 
stochastic process, and so much of this chapter has been focused on the analysis 
of individual stochasticity. The distinction is particularly important in evolutionary 
demography, where variance in lifetime reproductive output is routinely treated as 
variance in fitness, or a component of fitness. See Sect. 5.4.4 for some recent work 
on this problem. 

Individual stochasticity is an important component of demography, for both 
human and non-human populations. It complements environmental stochasticity 
(externally imposed random changes in vital rates) and demographic stochasticity 
(randomness in the growth of populations due to stochastic survival and reproduc- 
tion) (Caswell and Vindenes 2018). Individual stochasticity reflects randomness in 
the pathways that individuals take through the life cycle. It expresses itself in inter- 
individual variation in occupancy times, longevity, lifetime reproductive output, and 
other outcomes. The availability of methods based on Markov chains promises 
to change the way population biologists approach the analysis of variance among 
individuals (Caswell 2011; Tuljapurkar et al. 2009; Steiner and Tuljapurkar 2012; 
van Daalen and Caswell 2015; van Daalen and Caswell 2017). 


5.7 Discussion 


Taking advantage of the Markov chain formulation of the life cycle opens up a 
wealth of demographic information. The age-classified information extracted from 
a stage-classified model can form a valuable component of behavioral studies, 
especially if the model (like the right whale example) includes reproductive behavior 
as part of the life cycle structure. Longevity provides a powerful way to compare 
mortality schedules among species, populations, or environmental conditions, but 
it has been inaccessible to stage-classified analysis prior to the development of 
Markov chain methods. The generation time characterizes an important population 
time scale, with implications in conservation (IUCN Species Survival Commission 
2001), but there has been no way to compute it from stage-classified models. 
Stage-classified life cycles may have consequences that are not yet appreciated, 
but must be considered when interpreting the results. For example, any stage- 
classified model eventually leads to an age-independent mortality rate (Horvitz 
and Tuljapurkar 2008), and so is of limited use in the study of senescence. This 
fact has consequences for life expectancy and variance in longevity that are not 
well understood (at least by me). For the right whale, expected longevity at birth 
is 32 years with a standard deviation of 34 years. It is unlikely that there are 
appreciable numbers of whales alive at even one standard deviation above this mean. 
The high survival probability and the assumption of age-independence lead to the 
high standard deviation. Those of us who work with stage-classified models are 
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accustomed to this, but discount its importance because it (often) has little effect 
on À. It will be important to determine the stochastic consequences of simplifying 
assumptions in the life cycle graph. 

This chapter does not begin to exhaust the information that can be extracted 
from the Markov chain formulation of a stage-classified model. Three examples 
of particular interest are the occupancy of sets of states, the problem of competing 
risks, and the calculation of passage times. It is often of interest to calculate the 
statistics of occupancy of sets of states (e.g., all reproductive classes, or all stages 
in some particular health condition). We have seen how to calculate the moments 
of the occupancy time of single states. The mean occupancy time of a set of states 
is the sum of the mean occupancy times of each state, but that is not true for the 
variance or higher moments. Roth and Caswell (2018) derived a general expression 
for all the statistics, and the complete distribution, of occupancy time for any set 
of states. If more than one absorbing state exists (e.g., death at different stages, or 
from different causes), then the risks of absorbtion compete, because an individual 
can only be absorbed (i.e., die) once. It is possible to calculate the probability of 
absorbtion in each state, and to explore the effects of changing one risk on the 
probability of experiencing another (Caswell and Ouellette 2018). Passage times 
refer to the time required to get from one stage to another in the life cycle. An 
important passage time is the birth interval: the time from one birth to the next. This 
can only be calculated for individuals that do reproduce a second time (otherwise 
the interval is infinite), and so it requires developing a chain that is conditional on 
successfully reaching the reproductive state (Caswell 2001). In species that produce 
only one or a few offspring, reproduction cannot be adjusted in response to the 
environment by changing offspring number, and so changes in the birth interval are 
particularly important in such species. 


A Appendix: Derivations 


This appendix contains step-by-step derivations of many of the results in this 
chapter, especially for sensitivities. Taking advantage of the freedom from length 
limits, I have tried to show the derivations step-by-step. Recall the definitions of the 
Hadamard product 


A oB= (ajjbi;), (5.84) 
the Kronecker product 


A @B= (a;;B), (5.85) 
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the vec operator 


a 
ab c 
vec (3) =|, i (5.86) 
d 
and Roth’s theorem 
= T 
vec (ABC) = (c @ A) vec B. (5.87) 


A.I Variance in Occupancy Times 


The occupancy time in transient state i, starting from transient state j, is vj;. The 
matrix of variances of the v;; is 


V = (V(vi;)) = (Nag — 15) N— NoN (5.88) 

(Caswell 2006, derived from Theorem 3.1 of Iosifescu 1980) where Nag is a matrix 

with the diagonal elements of N on its diagonal and zeros elsewhere; it can be 
written 

Nag = 1; o N (5.89) 


Differentiating both sides of (5.88) gives 


dV = 2(I; odN)N + 2(I; o N)(dN) — dN 
—(dN) oN — No (dN) (5.90) 


The next step is to apply the vec operator to both sides. The vec of a Hadamard 
product can be written in two ways: 


vec (A o B) = D (vec A)vec B = D (vec B)vec A. (5.91) 


Using this result and Roth’s theorem (5.87) gives 


dvecV =2 (NT 2 L) D (vec I, )dvec N + 2 |i, 9 (Lo N)]| dvec N 


—dvec N — 2D (vec N)dvec N (5.92) 
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Factoring out dvec N and using the chain rule gives the final result 


dvec V 
crs 2 (NT @1,) D (vec Es) + 2 (Is @ Na) 
dvecN 
—I,2 — 2D (vec | aan (5.93) 


A.2 Life Expectancy 


Let ņ; be the time to absorbtion (i.e., death) of an individual currently in stage i. 
The vector E (n) of expected values of the 7; satisfies 


E(n)' =1'N (5.94) 
where 1 is a vector of ones. Differentiating both sides gives 
dE(n)' =1'(dN) (5.95) 
Applying the vec operator gives 
dE) = (1, 2 1") dvecN (5.96) 


Applying the identification theorem and the chain rule, and using (5.16) for the 
sensitivity of the fundamental matrix, gives 


ee = (1) @1") (N'on) — (5.97) 


This gives the derivative of the entire vector of life expectancies. Suppose that stage 
1 corresponds to birth. The life expectancy at birth is then 


E(m) = 1'Ne, (5.98) 


where e; is a vector with 1 in the first position and zeros elsewhere. Following the 
same derivation gives 


ae = (ef @1") (NT @N) cai (5.99) 
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A.3 Variance in Longevity 
The variance of the time to absorbtion satisfies 
Vin)’ =1'NQN-D-—E (n°) oF (n°) (5.100) 
(Caswell 2006, derived from Theorem 3.2 of Iosifescu 1980). Differentiating gives 
dV (n)' = 21" (dN)N + 21™N(AN) — 1" (dN) — 2E (n°) odE (n°) (5.101) 


Applying the vec operator and Roth’s theorem (5.87), using (5.91) for the vec of the 
Hadamard product, gives 


dV) = [2 (NT 2 1") 42 (I, 2 IN) 
z (1, 2 1") | dvecN —2D (Eq) dE(n) (5.102) 
Substituting (5.96) for d E() gives 
dV(n) = [2 (NT 2 1") +2 (1, 2 IN) 
— (1, 2 1") -2D (Ew) (1, 2 1") | dvecN (5.103) 


Using (5.16) for the sensitivity of N, the identification theorem, and the chain rule 
finally leads to 


a 2 = [2(NT@1") +2(L 817) 


dvec U 
do" 


= (1, @ 1") — 2D (EM) (I, 2 1") | (NT 2 N) (5.104) 


A.4 Net Reproductive Rate 


The net reproductive rate Ro is given by the dominant eigenvalue of FN. Let y and 
x be the right and left eigenvectors, respectively, of FN, corresponding to Ro. The 
matrix calculus version of the standard eigenvalue perturbation result (e.g., Caswell 
1978) gives 


dRo = x'd(FN)y 
= x! [(dF)N+ E(daN)] y (5.105) 
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Applying the vec operator to both sides gives 
dRo = (y"NT ® x") dvec F + (y ® x") dvec N (5.106) 


Applying the chain rule and the result (5.14) for dvec N gives the sensitivity of Ro in 
terms of effects of the parameter vector 0 on the fertility matrix F and the transient 
matrix U: 


A = (y"NT 2 =) sF + (y" 2 x'F) (NT 2 N) — (5.107) 


A.5 Cohort Generation Time 


To derive the cohort generation time, we begin at time tf = O with an individual 
newly born in stage j. This tiny cohort is described by an initial vector e;. The 
expected survivors of this cohort at time t are U'e;. The expected offspring produced 
by these survivors at time t are FU’e ;. Summing over the lifetime of the cohort gives 
a vector of expected lifetime reproduction, of all types of offspring, 


[00] 
E (total offspring) = > FU'e; 
t=0 


CO 
=F (È v') ej 
t=0 
= FNe; (5.108) 


Let m‘/) (t) be the vector of offspring production at time t, expressed as a proportion 
of the lifetime total of the individual starting in stage j. Then 


m'))(1) = D (FNe;) ' (FU'e;) (5.109) 


If no offspring of some stage, say stage i, are produced, then set mY ) (t) = 0. 


The cohort generation time 2“ is the expectation of the distribution defined by 
DA: 
mV’ (t): 


oo 
pb? = So xm‘? (x) 
x=0 


= Ý D (FNe;) | xFU*e; 
x 


=D (FNe;) | F (x vv") ej. (5.110) 


x 
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The summation can be simplified 


X xU =0+U42U? +30 


x 


=U[0+U +20? +. +1404 +- | 


sujt Dav] (5.111) 


Solving this gives 


) xU" = NUN. (5.112) 
x 


Putting all the pieces together gives the generation time 


uO = D (FNe;) | FNUNe;. (5.113) 


A.5.1 Sensitivity of Generation Time 


To differentiate (5.113) may seem complicated. To make life easier, define some 
notation, 


X = D (FNe;) (5.114) 
r = FNUNe; (5.115) 
in terms of which (5.113) 
pw) = Xlr. (5.116) 
Differentiate, 
dp =d (X')r+X'dr (5.117) 
and apply the vec operator 
dp = (r"@1) dvec X~! + Xdveer. (5.118) 


The same steps that led to Eq. (5.14) for dvec N, and noting that X is symmetric, 
leads to 


dvecX7! = - (x~ X7!) avec X. (5.119) 
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The differential of vec X is obtained by writing 
X = Io (FNejI"). (5.120) 
Differentiating and using the rule (5.91) for the vec of a Hadamard product gives 
dvecX = D (vec I) [ (1eFNT @ 1) dvec F + (1e; @ F) dvec N| (5.121) 
Differentiating r and applying the vec operator gives 
dvecr = | (NUNe;)' @ I| dvec F + | (UNe;)" & F| dvec N 
$ [(Ne;)' 2 FN| dvecU + [e 2 fNU] dvecN (5.122) 


Whew! 
Finally, substituting (5.119), (5.121) and (5.122) into (5.117), we obtain 


A — (7 81) (x~ @Xx"') D (vec I) 


TaT dvec F dvec N 


dvec F 


+ | [evune,)" ot] =r + [(Une;)" F] -a 


do! 
dvecN 
do" 


+ [(Ne;)" 8 FN| ne + Q ® FNU] 


h: (5.123) 


This may be an impressive formula, but it is straightforward to compute, given the 
derivatives of U, F, and N with respect to 0. 
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Chapter 6 N 
Age x Stage-Classified Models TRICA 


6.1 Introduction 


The first step in developing any kind of structured population model is choosing one 
or more variables in terms of which to describe the population structure. The job of 
these i-state variables is to encapsulate all the information about the past experience 
of an individual that is relevant to its future behavior (Metz and Diekmann 1986; 
Caswell 2001). Classical demography (for both humans and for non-human animals 
and plants) uses age as a i-state, but other, more biologically relevant criteria (e.g., 
size, developmental stage, parity, physiological condition, etc.) are now widely used 
in ecology, with age-classified models viewed as a special case. 

However, it has long been recognized that cases exist where it is important to 
classify individuals by both age and stage. 


1. Even in a stage-classified model, age still exists; every individual becomes older, 
by one unit of age, with the passage of each unit of time (e.g., Feichtinger 
1971a; Caswell 2001, 2006, 2009; Tuljapurkar and Horvitz 2006; Horvitz and 
Tuljapurkar 2008). In these analyses, age dependence is implicit in the stage- 
classified model (see Chap. 5). Models that include both age and stage provide 
information that goes beyond this implicit dependence. 

2. If the vital rates depend on both age and stage, only a model that includes both 
can reveal the joint action of age-and stage-specific processes (e.g., Goodman 
1969). Such models, of course, require information on the joint age-dependence 
and stage-dependence of the vital rates, and thus are challenging to construct. 
A special case that has been extensively explored is the multi-regional case, in 
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which the stage variable describes spatial location (e.g., Rogers 1966; Lebreton 
1996). Models that combine age and some measure of health or disability status 
are an important part of health demography (e.g., Willekens 2014; Peeters et al. 
2002; Wu et al. 2006; Zhou et al. 2016). 


This chapter presents a model framework in which individuals are classified 
by age and stage, using the vec-permutation matrix approach (so-called for the 
role that the vec-permutation matrix plays in rearranging age and stage categories 
in the population vector). This formalism was introduced by Hunter and Caswell 
(2005) for populations classified by stage and location, was used in Chap.5 to 
classify individuals by stage and environmental state; it has also been applied to 
stage and infection status (Klepac and Caswell 2011), stage and age (Caswell 
2012; Caswell et al. 2018), and age and frailty (Caswell 2014). Megamatrix 
models (e.g., Pascarella and Horvitz 1998; Horvitz and Tuljapurkar 2008) can 
be written using this approach, as can block-structured multiregional models 
(e.g., Rogers 1975; Lebreton 1996). Matrix models can describe both popula- 
tion dynamics and cohort dynamics. Population dynamics (population growth, 
age and stage structure, reproductive value) depend on both the transitions of 
extant individuals and the production of new individuals by reproduction. In 
contrast, cohort dynamics (survivorship, life expectancy, age at death, generation 
time) depend only on the fates of already existing individuals. This chapter 
describes both kinds of analysis. For a more complete review and treatment, 
see Caswell et al. (2018). 


6.2 Model Construction 


The construction and analysis of these models requires a number of differ- 
ent matrices and operators (some of the notation is collected in Table 6.1). 
Individuals are classified into stages 1,...,5 and age classes 1,...,@. The 
model treats the processes of moving among stages and moving among age 
classes as alternating. First, stage-specific demography operates to move individuals 
among stages and to produce new offspring, with rates appropriate to their ages. 
Then aging acts to move individuals to the next older age, and the process 
repeats. 

Define a stage-classified projection matrix A;, of dimension s x s, for each age 
class, i = 1,...,@. Decompose A; into 


Aj = U; +F; (6.1) 


where U; contains the transition probabilities of extant individuals and F; describes 
the generation of new individuals by reproduction. 
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Table 6.1 Mathematical notation used in this chapter. Dimensions are shown, where relevant, for 


matrices and vectors; s denotes the number of stages and w the number of age classes 


Quantity 
Aj, Fj, U; 


Du, Dr 


A, F, U, D 
A, U, ete. 


Description 


Stage-classified projection, fertility, and 
transition matrices for age class i 


Age transition matrices for individuals 
already present in the population and for 
new individuals produced by reproduction 


Block-diagonal matrices 


Age x stage matrices constructed from 
block-diagonal matrices using the vec- 
permutation matrix 


Vec-permutation matrix 
Identity matrix 
Vector of ones 


The ith unit vector, with a 1 in the ith entry 
and zeros elsewhere 


A matrix with a 1 in the (i, j) position, and 
zeros elsewhere 


Kronecker product 

Hadamard, or element-by-element, product 
The vec operator, which stacks the columns 
of am x n matrix X into a mn x 1 vector 


A diagonal matrix with x on the diagonal 
and zeros elsewhere 


Dimension 


S XS 


SW X SW 
S XS 
sxl 
Various 


Various 


Aging is described by two matrices, each of dimension w x w (shown here for 
3 x 3, but easily generalized), 


000 


Du = | 100 dimension w x w 


011 


111 
Dr = | 000 Oxo 
000 


(6.2) 


(6.3) 


The matrix Dy applies to extant individuals; such an individual advances to the next 
age class. I have set the (w, w) entry of Dy to 1, so that the last age class contains 
individuals of age w and older. If this entry were set to 0, all individuals in the 
last age class would die. The matrix Dp applies to individuals newly created by 
reproduction; such newborn individuals are placed in the first age class, regardless 
of the age of their parents. 
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Using the matrices A;, Uj, F;, Du, and Dp, construct block-diagonal matrices, 
each of dimension sw x sœ. For example, 
Aj 
A= by (6.4) 
Aw 


with similar structures for U, F, Dy, and Dp. These block-diagonal matrices can be 
written 


A=) Gi @ Ai) (6.5) 
U= > (Ej; ® U;) (6.6) 
F = } (Ei ® Fi) (6.7) 
Du = 1; ® Du (6.8) 
Dr = I, ® Df (6.9) 


where E;; is of dimension w x w. 
If the demography is strictly stage-dependent, so that A; = A, fori = 1,...,@, 
then the block-diagonal matrices A, F, and U reduce to, e.g., 


A=I,@A (6.10) 


with corresponding expressions for F and U. 
The state of the population at time ¢ could be described by a 2-dimensional array 


N11 °°" Nig 
NO=| : : [O sxo (6.11) 


Ns1 +t Nsw 


where rows correspond to stages and columns to age classes. However, such a 2- 
dimensional array cannot be projected directly; instead, it is transformed to a vector, 


nil 


nA =veeN(t) = |: [0 sox (6.12) 
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using the vec operator, which stacks the columns of the matrix one above the 
next. The vector n(t) created in this way contains the stages arranged within age 
classes. An alternative configuration, with ages arranged within stages, is obtained 
by applying the vec operator to. M T, 


Nia 


veeNT(th=] : [0 sox (6.13) 


Nsw 


The two vectors vec N and vec NT are related by the vec-permutation matrix, or 
commutation matrix, K, (Henderson and Searle 1981), 


vec NT = K, oyec N (6.14) 


(see Sect. 2.2.3). Where no confusion seems likely to arise, we will suppress the 
subscripts and write Ks œ as K. As with any permutation matrix, K'=K!. 

The goal of the model is to project the age-stage vector n = vec M from ż to 
t + 1. The complete projection is given by 


n(t + 1) = (K"ouku + K'DpK ') n(t) (6.15) 


This deserves some explanation. Consider the first term on the right hand side, 
K' DyKU. Reading from right to left, it first operates on the vector n(f) with the 
block diagonal matrix U, which moves surviving extant individuals among stages 
without changing their age. Then the resulting vector is rearranged by the vec- 
permutation matrix K to group individuals by age classes within each stage. The 
block diagonal matrix Dy then moves each surviving individual to the next older age 
class. Finally, K! rearranges the vector back to the stage-within-age arrangement of 
n(t). 

The second term in (6.15), K'DpK *", carries out a similar sequence of trans- 
formations for the generation of new individuals. First, newborn individuals are 
produced according to the block-diagonal fertility matrix F. The resulting vector 
is rearranged by the vec-permutation matrix, and then the matrix Dp places all the 
newborn individuals into the first age class. Finally, K! rearranges the vector to the 
stage-within-age arrangement. 
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I will write the age x stage projection matrix in (6.15) as 
A= (K'DpKU 4 K'DrKE) (6.16) 
= (ŭ $ F) (6.17) 


The matrices A, U, and F that operate on the age-stage vector n are denoted with 
a tilde (A, U, F); these matrices define the age x stage-classified model and can be 
subjected to all the usual demographic analyses. 


6.3 Sensitivity Analysis 


Age-stage models pose particular challenges for perturbation analysis, because 
interest naturally focuses on changes in the matrices F; and U; (i = 1,...,@), 
which are deeply embedded within F, U, and A. 

Consider a generic dependent variable &, which is a scalar- or vector-valued 
function of A. In the examples to follow, & will be either the population growth 
rate à or the joint distribution of age and stage at death in a cohort, but it could 
be any variable calculated from A. Let 0 be a vector of parameters; these could 
be entries of the matrices, or lower-level parameters determining those entries. 
The goal of perturbation analysis is to obtain the derivative of & with respect 
to 6, 


d — dé dvec A 
do!  dvecTĂ do" 


(6.18) 


The first term in (6.18) is the derivative of £ with respect to the matrix A. If, for 
example, € was the dominant eigenvalue i, then this term would be the matrix 
calculus version of the well-known eigenvalue sensitivity equation. 

The second term in (6.18) requires differentiating A with respect to the parame- 
ters that determine it. From (6.16), write 


A = QUU + QpF (6.19) 
where Quy = K'DyK and Qr = K'DpK are the (constant) matrix products 
appearing in the definition of U and F in (6.16). 

Differentiating A in (6.19) gives 


dvec A = (I; Q Qu) dvecU + (sy Q Qrp) dvec F (6.20) 
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This requires the differentials of U and F. Differentiating U in (6.6) gives 
w 
dU = ` (Ei; ® dU;) (6.21) 
i=l 
Applying the vec operator to dU gives 
w 
dvecU = X` (Eiji 8 K 8 L) (vec Io ® I,2) dvec U; (6.22) 
i=l 


using the results of Magnus and Neudecker (1985, Theorem 11); see also Klepac 
and Caswell (2011, Appendix B) on the derivative of the Kronecker product. 
Differentiation of F proceeds in the same fashion, yielding 


oO 
dvecF = X` (E; ® K 8 L) (vec Io Q I,2) dvec F; (6.23) 
i=l 


In the special case where U and F are constructed from single stage-classified 
matrices U and F, as in (6.10), Eqs. (6.22) and (6.23) simplify even further to 


dvecU = (I, ® K 8 I) (vec Io Q I,2) dvec U (6.24) 
dvecF = (Ie ® K @I;) (vec I, ® I,2) dvec F (6.25) 


Substituting (6.22) and (6.23) into (6.20) and then substituting (6.20) into (6.18) 
yields the general result for the derivative 


dé dé i dvec U; 
oe =| Usw 8 Qu) (Ej; ® K @I;) vec Íp @ I, 
do! | z 2 ( i do! | 


i=l 


d |a E @K OI pern TE 
Tec TĂ Tuei ii OKO s) (vec Iv 8 2) do" 


(6.26) 


Notice that (6.26) requires only three pieces of demographic information: the 
derivatives of U; and F; with respect to the parameters (whatever those may be in 
the case at hand) and the sensitivity of the dependent variable & (whatever that may 
be) to the elements of the matrix A from which it is calculated. All the other pieces 
of (6.26) are constants. Some of these constant matrices may be large, depending 
on s and w, but they are very sparse; the sparse matrix technology available in 
MATLAB can be extremely useful in implementation. An alternative formulation of 
the differentials of the block matrices U and F is given in Caswell and van Daalen 
(2016). 
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6.4 Examples 


Here we consider two examples of the sensitivity analysis of age-stage model to 
extract age-classified information from a stage-classified model. The first example 
will derive the sensitivity of the population growth rate A, obtaining the sensitivity of 
à to both age- and stage-specific survival, permitting examination of how selection 
pressures on senescence-inducing traits would vary from stage to stage. The second 
example is an analysis of the joint distribution of age and stage at death. 

These examples are based on a stage-classified model (Parker 2000) for Scotch 
broom (Cytisus scoparius). Scotch broom is a large (up to 4m tall) leguminous 
shrub, introduced into North America from Europe in the late nineteenth century. 
It is an invasive plant, considered a pest in the northwestern parts of North 
America. Stage-classified demographic models have been used to evaluate potential 
management policies for the plant (Parker 2000) and to investigate its potential for 
spatial spread (Neubert and Parker 2004). 

The model contains seven stages (stage | =seeds, 2=seedlings, 3 =juveniles, 
4=small adults, 5=medium adults, 6=large adults, 7=extra-large adults), and 
parameters were estimated at a number of locations in Washington State. As is 
typical with many perennial plant species, survival is low for seeds and seedlings, 
but increases dramatically in larger stages. Parker’s study presented estimated 
projection matrices for plants at the edge, at intermediate locations, and at the 
center of an invading stand. Plants near the center experience more crowding, with 
resulting reduced rates of survival, growth, and fertility. 


6.4.1 Population Growth Rate and Selection Gradients 


The population growth rate À, the stable age or stage distribution w, and age or 
stage-specific reproductive value vector v are given by the dominant eigenvalue 
and corresponding right and left eigenvectors of the population projection matrix, 
respectively. In evolutionary demography, à measures the fitness of a phenotype, 
in that it gives the eventual rate at which descendants of an individual with that 
phenotype will increase. The selection gradient on a vector of traits 0 is given by 


dir 
aaa (6.27) 
These gradients play a fundamental role in evolutionary biodemography, whether 
evolution is conceived of in terms of population genetics, quantitative genetics, 
adaptive dynamics, or mutation accumulation (e.g., Metz et al. 1992; Dercole and 
Rinaldi 2008; Rice 2004; Barfield et al. 2011). If the gradient is positive, selection 
favors an increase in the trait, and vice-versa. 
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In this application, € in (6.18) is the dominant eigenvalue A. Let w and v be 
the right and left eigenvectors corresponding to A, scaled so that v'w = 1. Then, 
in (6.26), 

dx 
dvecTA 


=w Qv! (6.28) 


See Chap. 3 and Caswell (2010). 

In this model, the vital rates are functions only of stage; the phenotype is blind 
to the age of the individual. However, the terms in the summations in (6.26) give 
the selection gradients on traits that would modify the phenotype at each age. 
That is, 


is da. dvec U; 
= -| du ® Qu) (Ex 9K 8L) (vecly 8I | 
dO" \age-i dvec™A | pene s) (vec To ® I,2) do" 
da dvec F; 
Wecta | (sw ® QF) (E; ® K 8 I) (vec Io Q L2) aT | 


(6.29) 


Thus, these terms reveal the selection patterns that would operate on a mutation that 
was able to detect the age of an individual within a given stage, or that affected age 
differentially depending on the stage of the individual. 

To examine the selection gradients on survival, it is necessary to separate survival 
from inter-stage transitions in U. Let ø be the vector of stage-specific survival 
probabilities. The matrix U can be written as the product of a matrix X = 1o! 
containing the survival probabilities on the diagonal and a matrix G of transition 
probabilities, conditional on survival; 


U = GÈ. (6.30) 
(cf. Chap. 8). If F is independent! of ø, then 
dU =G dÈ. (6.31) 
Applying the vec operator gives 


dvecU = (I; ® G) vec D (1ydo") 
= (I; 8 G) D (vec L) A; 8 1;) do (6.32) 


'By assuming that F does not depend on ø, I am in effect choosing a pre-breeding census and 
excluding neonatal mortality from ø. 
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which implies that 


dvec U 
do! 


= (I; 8 G)D (vecIs) (Is 8 15) (6.33) 


Setting 0 = ø and substituting (6.33) and (6.28) into (6.18) gives the selection 
gradient on o. Substituting (6.33) and (6.28) into (6.29), with dvec F/do' = 0, 
gives the selection gradient on o as a function of age and stage. 


Results The projection matrix A for Scotch broom? is 


0.740 0 3.400 47.1 108.700 1120.0 3339.0 


0.001 0.310 0 0 0 0 0 
0 0.3500.310 0 0 0 0 
A= 0 0.038 0.290 0.024 0 0 0 (6.34) 
0 0 0.069 0.390 0.320 o 0.091 
0 0 O 0.440 0.440 0.530 0.091 
0 0 0 O 0.029 0.400 0.730 


The matrix U is obtained from A by setting all elements in the first row, except 
for a11, to zero. The matrix F is a 7 x 7 matrix with the elements of row 1, columns 
2-7 of A in the corresponding positions, and zeros elsewhere. The maximum age 
was set to w = 30. The aging matrices Dy and Dx are given by (6.2) and (6.3) with 
w = 30. Because the vital rates do not depend on age, the dominant eigenvalues of 
A and A should be identical, and they are; A = 1.268. 

The selection gradients on stage-specific survival (i.e., sensitivities of À to ø) are 
shown in Fig. 6.1. There is a steady decline with increasing stage, from seeds to 
medium-sized adults, but then an increase for large and extra-large adults. A quite 
different pattern emerges when the selection gradients are calculated as functions 
of both age and stage, using (6.29). These results are shown in Fig. 6.2. The age- 
specific selection gradients on survival in stages 1-3 are strictly decreasing with age. 
But the age-specific selection gradients on survival in the adult stages 4—7 increase 
with age, level off, and then decline. The increase is longer and more pronounced in 
the larger adult stages. 

It is now known that this pattern is widespread in plant populations. It appears 
in all eight of the Scotch broom populations studied by Parker (2000), and in 
almost all of 36 species of plants examined by Caswell and Salguero-Gomez 
(2013). It has important implications for the evolution of senescence. Hamilton 
(1966) showed that the selection gradient on age-specific mortality is always 
decreases with age, and argued that this implied that selection would always lead to 
senescence. Incorporating stage-dependence as well as age-dependence of the vital 


>This is the matrix for the Discovery Park population, 1993-1994, edge conditions; taken from the 
Appendix of Parker (2000). 
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Fig. 6.1 Sensitivity of 
population growth rate A to 
stage-specific survival 
probabilities. Calculated for 
the stage-classified model of 
Scotch broom (Cytisus 
scoparius) using data from 
Parker (2000). Stages: 

1 = seeds, 2 = seedlings, 

3 =juveniles, 4=small adults, 
5 = medium adults, 6 = large 
adults, 7 = extra-large adults 


Fig. 6.2 Sensitivity of 
population growth rate A to 
stage-specific survival as a 
function of age, for Scotch 
broom. Stages defined as in 
Fig. 6.1 
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rates means that, over some range of ages, the selection gradient increases (contra- 
senescent selection in the terminology of Caswell and Salguero-Gémez 2013). Thus 
conclusions that follow from the general decline in selection gradients with age 
may not apply to traits that affect age-specific survival differentially depending 
on developmental stage. Traits that affect survival in adult stages should postpone 
senescence for at least some time. 
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6.4.2 Distributions of Age and Stage at Death 


The pattern of longevity within a population is captured by the probability dis- 
tribution of the age at death, one of the standard results of age-classified life 
table analysis. The moments of the age at death and their sensitivity can also 
be calculated directly from stage-classified models using Markov chain methods 
(Feichtinger 1971b; Caswell 2001, 2006, 2009; Tuljapurkar and Horvitz 2006; 
Horvitz and Tuljapurkar 2008); see Chaps.4 and 5. Here we can go beyond that 
and get the full joint distribution of stage and age at death, along with the marginal 
distributions of age at death and stage at death, implied by an age x stage classified 
model. 

To do this, note that the cohort projection matrix U describes movement 
of individuals among transient states of an absorbing Markov chain, where the 
absorbing state is death, or death classified by stage or age at death. The transition 


matrix of the chain is 
` U0 
P={[{— (6.35) 
MI 


By properly structuring M, the model can give information about the age, stage, 
or the joint distribution of age and stage at death.? Each row of M corresponds 
to an absorbing state, and mj; is the probability of a transition from transient 
state j to absorbing state i. To compute the distribution of age and stage at 
death, we define the absorbing states to correspond to the age x stage combination 


at death. Thus M contains probabilities of death on the diagonal and zeros 
elsewhere, 


M=Lo -D (13,0) (6.36) 
The fundamental matrix of the Markov chain in (6.35) is 
a 2h 
N= (1-0) (6.37) 


The (i, j) element of N is the expected number of visits that an individual in state j 
will make to transient state i before death. 
Consider the eventual fate of an individual starting in transient state j. Let 


bij =P [eventual absorption in į | starting in j] (6.38) 


3This also leads to a powerful approach, including sensitivity analysis, for cause of death 
calculations (Caswell and Ouellette 2016, 2018). 
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The bij are the elements of the matrix B (sw x sw) given by 
B = MN (6.39) 


(losifescu 1980, Theorem 3.3; see also Caswell 2001, Section 5.1). Since the 
absorbing states (the rows of M) correspond to combinations of age and stage at 
death, column j of B gives the joint distribution of age and stage at death, starting 
from state (i.e., age x stage combination) j: 


BC, j) = Be; (6.40) 


using MATLAB notation in which X(:, j) is column j of X, and where ej is a vector 
of length sw with a | in the jth entry and zeros elsewhere. The rows of B correspond 
to combinations of stage and age at death. Summing the rows over stages gives the 
marginal distribution of age at death, starting in column j of B, as 


gj = (1. Q 17) BC, j) marginal age distribution wx l (6.41) 


Similarly, summing over ages gives the marginal distribution of stage at death: 


h; = (z Q L) BC, j) marginal stage distribution sxl (6.42) 


6.4.2.1 Perturbation Analysis 


In the general sensitivity equation (6.18), the dependent variable § = BC, Jj). This 
depends only on U, so the first term in (6.18) can be shown to be 


dg dB(:, j) 
dvec A dvecU 


(6.43) 


= —(e]NT 8 Lo) D (vec Iso) (Iso ® 1so11,) + (eTNT @ B) 
(6.44) 
The desired derivative dBC:, D/ do" is obtained by substituting (6.44) for 
dé /dvec A in (6.26), setting dvec F; /ao" =0. 


The sensitivities of the marginal distributions of age and stage at death are then 
given by 


dg; 7) dC, j) 
“et = (e 17) J (6.45) 
dh; fr dBC:, j) 
“ot = ( T 9) sat (6.46) 
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Derivation To derive the sensitivity of the joint distribution of age and stage at 
death, conditional on some starting age x stage combination, we start by differenti- 
ating equation (6.40) for column j of B and applying the vec operator, 
dB(:, j) = (e 2 Iw) dvec B. (6.47) 
However, from (6.39), B= MN, so 
dB = (aM) Ñ + M (dN). (6.48) 
and 
dvec B = (NT ® Lo) dvec M + (Bo ® M) dvecN. (6.49) 
The differential of the fundamental matrix N is 


dvecN = (NT 2 Ñ) dvec Ŭ (6.50) 


(Caswell 2006; see Chap. 5). The differential of M is obtained by rewriting (6.36) 
as 


M= Lo — Lo o (t1) , (6.51) 
differentiating, 
dM = -Lo 0 | 1o17 (aŭ 
= sw O | tsatsey ; (6.52) 


and applying the vec operator to obtain 
dvec M = —D (vec Iyo) (Io 2 kall) dvec U (6.53) 
Substituting (6.50) and (6.53) into (6.49) gives 
dvecB = [ = (NT 2 ia) D (vec Io) (Ine 2 toil) 
+ (Lo 9 M) (ÑT Ñ) | avec W (6.54) 


Substituting this into (6.47) gives 
Ri. 7) — T Na T 
dBC:, j) = | - (6) 8 Lo) (NT 8 Lo) D (vecIso) (Iso 8 Loll) 


$ (el 2 bs) (Lo 2 M) (NT @ 8) | dvecU (6.55) 
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Equation (6.55) can be simplified to obtain (6.44), using the fact that 
(A ® B) (C @ D) = (AC @ BD), 


provided the products exist. 


Results Figure 6.3 shows the joint distribution of age and stage at death for a 
seed of age 1 (one definition of “newborn” in this life cycle), with œ = 40. 
Almost all seeds will die as seeds, because the germination probability is low, 
a2, = 0.001; see (6.34). The fates of seedlings (another possible choice for newborn 
status) are more diverse, and those of juveniles and small adults even moreso; the 
distributions show what proportion will die as seedlings, juveniles, etc., and at what 
ages (Fig. 6.3). 

The marginal distribution of age at death, for individuals in each initial stage, is 
given in Fig. 6.4. Not surprisingly, larger stages have an age distribution of death 
shifted to later ages, including some probability of survival to age class w (> 40 
years in this calculation). 

The sensitivity of g2 (the marginal distribution of age at death for a seedling) 
is shown in Fig. 6.5. Changes in the survival of seeds (o1) have no effect on this 


Seed Seedling 


Probability 


Probability 


Fig. 6.3 The joint probability distribution of age (1, ..., 10) and stage (1,..., 7) at death for an 
individual seed, seedling, juvenile, or small adult of Scotch broom. Stages as in Fig. 6.1 
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Fig. 6.4 The marginal 0.7 r 
distributions of age at death 
for individuals of Scotch 0.6! 
broom in each stage. ' 
Maximum age is w = 40. 
Stages as in Fig. 6.1 = os 
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Fig. 6.5 Sensitivity of the marginal distribution of age at death, go, to the survival probabilities of 
each stage, for an individual starting in stage 2 (seedlings). Stages as in Fig. 6.1 


distribution, because seedlings have already left the seed stage. Changes in 02-07 
shift the distribution to progressively older ages, by reducing the probability of death 
at young ages and increasing it at older ages. 
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6.5 Discussion 


Models in which individuals are classified by both age and stage extend demo- 
graphic analyses in several directions. They permit biodemographic analyses of 
aging to take advantage of the many stage-classified demographic analyses accu- 
mulated by ecologists (Salguero-Gémez et al. 2015, 2016). They also permit human 
demographers to take account of factors other than age in determining mortality, 
longevity, fertility, and population dynamics. 

Age- and stage-specific demographic processes are often combined in demogra- 
phy using multistate life table methods (e.g., Rogers 1975; Willekens 2002, 2014). 
These are usually focused on cohort dynamics and associated survival statistics (but 
see Rogers 1975, Chap. 5 for an explicit consideration of population projection). 
Multistate life table models are written as continuous-parameter, discrete-state 
Markov chains, where the parameter represents age and the states represent stages. 
In order to solve the resulting equations, the dynamics must be approximated over a 
(usually short) finite age interval; this would correspond to the sequence of matrices 
Aj; in the model here. The age x stage-classified model described by Aisa way 
to solve the discretized equations in a single step, and makes possible a variety of 
analyses that are difficult or impossible in the usual life table formulation. Further 
investigation of the relation between continuous multistate life table methods and 
age x stage-classified models will be interesting. 

These analyses blur the distinction (Chap. 5) between implicit and explicit age 
dependence. If the A; are truly identical, by definition only implicit age dependence 
is revealed. But the structure of the age x stage model separates all of the age- 
dependent A;, and thus is ready to include any degree of explicit joint dependence 
of the vital rates on age and stage. 

Given sufficient longitudinal data on both age and stage, it is possible to estimate 
the stage-specific matrices A; as explicit functions of age; see Peeters et al. (2002) 
for an example of a study of human heart disease, and Lebreton et al. (2009) for 
a review of methods used in multistate capture-mark-recapture analysis in ecology. 
Needless to say, the data requirements for a full age x stage parameterization are 
challenging. I suspect that the development of estimation methods at intermediate 
levels of detail will be an important step. 


6.5.1 Reducibility and Ergodicity 


The properties of A raise an important theoretical and technical issue regarding 
population growth, fitness, and selection gradients. The use of A as a measure of 
fitness is usually justified by the strong ergodic theorem (Cohen 1979, Caswell 2001, 
Section 4.5.2), which guarantees the eventual convergence to the stable population 
structure and growth at a rate given by the dominant eigenvalue à. A sufficient 
condition for this convergence is that the projection matrix be irreducible; i.e., 
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that there exist a pathway connecting any two stages. Stott et al. (2010) surveyed 
published population projection matrices and found that reducible matrices were 
not uncommon, and explored the implications for ergodicity. Reducible matrices are 
not as bad as some people think, but it is important to understand their implications, 
especially for age x stage models. 

General results about the irreducibility of block-structured matrices are difficult; 
see Csetenyi and Logofet (1989), Logofet (1993, Chap. 3), and Logofet and Belova 
(2007) for some important graph-theoretical results. However, the age x stage 
matrices developed here are unusual among population models in that they are 
(almost) always reducible, because they contain categories to which there are no 
possible pathways. This arises because age 1 individuals are produced only by 
reproduction. Hence there can never be age 1 individuals in any stage that is not 
produced by reproduction. For example, Scotch broom reproduces only by seeds, 
so age | seeds appear in the model. However, the matrix A also contains entries 
corresponding to age 1 seedlings, age 1 juveniles, age 1 adults, etc. These do not 
exist, and because there are no pathways to these stages from any other stages, the 
matrix A is reducible. 

The Perron-Frobenius theorem guarantees that a reducible non-negative matrix 
will have a real, non-negative, dominant eigenvalue that is at least as large as any 
of the others. However, the asymptotic population growth rate and structure may 
depend on initial conditions (Caswell 2001, Section 4.5.4) This means that one must 
ascertain that the eigenvalues and eigenvectors under analysis correspond to initial 
conditions of interest. 

Appendix A shows that a necessary and sufficient condition for population 
growth to be described by the dominant eigenvalue A of A, regardless of the (non- 
negative and non-zero) initial population vector, is that the left eigenvector v be 
strictly positive, and that this corresponds to a particular block-triangular form of A. 
This provides a simple check for the ergodicity of population growth, and justifies 
the use of A as a population growth rate and measure of fitness. 

Primitivity may be difficult to evaluate for an age x stage matrix (but see Logofet 
1993) but as with any projection matrix model, the long-term average growth rate 
of a primitive matrix is still given by the dominant real eigenvalue. 

The matrix A for Scotch broom in (6.34) is reducible, as shown by calculating 


~\S@ 
(Iv + A) and finding that this matrix contains zeros (Caswell 2001). However, 


the left eigenvector v is strictly positive, so we know that the population eventually 
grows at the rate à regardless of initial conditions. 


6.5.2 A Protocol for Age x Stage-Classified Models 


The approach outlined here gives a step-by-step procedure for constructing and 
analyzing age x stage-classified matrix population models. 
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1. Choose a question, and a corresponding demographic outcome. Are you inter- 
ested in population dynamics (growth, structure, transients)? Or in cohort 
dynamics (survival, longevity)? Or in some combination of the two? 

2. Obtain the stage-classified projection matrices Aj for ages i = 1,...,@. 

3. Decompose A; = U; + F;. 

4. Construct the block-diagonal matrices A, F, U, and D, according to Eqs. (6.5)— 
(6.10). 

5. Construct the age x stage matrices A, F, U using (6.16) and, if appropriate for 
the question at hand, also M and P using (6.35) and (6.36). 

6. Analyze the model, e.g., by computing eigenvalues, eigenvectors, the fundamen- 
tal matrix, etc., as appropriate. If necessary, check for reducibility and ergodicity 
using the methods in Sect. 6.5.1. 

7. For sensitivity analysis, 


(a) choose a dependent variable £ and a vector of parameters 0, 
(b) compute the sensitivity matrix d£ /dvec "A, 
(c) compute the matrices: 
dvec A; dvecU; dvec F; 
; , an 
do" do" do" 


(d) compute dé/do" according to (6.18). 


The explicit connection between matrix population models and absorbing 
Markov chain theory makes it possible to analyze both population dynamics and 
cohort dynamics in a unified framework (cf. Feichtinger 1971a; Caswell 2001, 
2006, 2009). Cohort dynamics are, in essence, the demography of individuals. 
It may seem paradoxical to speak of the demography of individuals, but that is 
what it is, because the statistical properties of a cohort (e.g., average lifespan) are 
probabilistic properties of an individual (e.g., life expectancy). Demography in 
general, and matrix population models in particular, provides the link between the 
individual and the population. 


A Appendix: Population Growth and Reducible Matrices 


Some ergodic properties of population growth under the action of reducible matrices 
are described by Caswell (2001, Section4.5.4). Here we can extend the analysis. 

Let A be a reducible non-negative projection matrix. By permutation of its rows 
and columns (i.e., renumbering the stages in the life cycle), A can be transformed to 
a block lower-triangular form. Here is an example: 


Bı 0 0 0 
Bz; B2 0 0 

A= , (6.56) 
B3ı B32 B33 0 


Ba: Baz B43 Bag 
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In this form, all the diagonal blocks B;; are either irreducible matrices or | x 1 (ie. 
scalar) zero matrices. The block triangular form is unique, up to a renumbering 
of the blocks and permutation of indices within blocks (Gantmacher 1959). It 
corresponds to a decomposition of the state space into a set of subspaces; let R; 
be the subspace corresponding to the block B;;. 

Some or all of the subdiagonal blocks in (6.56) may be zero. For reasons that will 
become apparent, consider an example where B21 = B43 = 0; i.e., 


Bı 0 0 0 
0 Bo 0 0 
Bs B32 B33 0 
Ba Baz 0 Bag 


A= (6.57) 


Gantmacher (1959, Section 13.4) calls a block Bj; isolated if there are no other 
non-zero blocks on its row, that is, if Bj; = 0 for j < i. I will call such a block 
row-isolated, and introduce the term column-isolated to describe any block Bj; with 
no other non-zero blocks in its column, that is, Bj; = 0 for j > i. In the matrix 
in (6.57), the blocks Bıı and B22 are row-isolated and the blocks B33 and B44 are 
column-isolated. 

If Bj; is row-isolated, then the life cycle graph contains no pathways from any 
state outside of the subspace R; to any state inside R;, and R; is a source. If Bj; is 
column-isolated, then the life cycle graph contains no pathways from any state in 
R; to any state outside R;, and R; is a sink. 

The eigenvalues of A are the eigenvalues of the diagonal blocks B;;. Let A, 
be the dominant eigenvalue of A, with right and left eigenvectors w; and vı. The 
Perron-Frobenius theorem guarantees that 41, w1, and vı are real and non-negative. 
Gantmacher (1959, Chap. 13, Theorem 6) proves that the eigenvector wy is strictly 
positive if and only if A; is an eigenvalue of every row-isolated block, and is not an 
eigenvalue of any of the non-row-isolated blocks. This makes it easy to demonstrate 
the following corollary. 


Corollary: Positivity of vı Let vı be the left eigenvector corresponding to A, [A]. 
Then v; is strictly positive if and only if 4;[A] is an eigenvalue of every column- 
isolated block, and is not an eigenvalue of any non-column-isolated block. 

To see this, note that vı is the right eigenvector of AT. The column-isolated 
blocks of A become row-isolated blocks of the block lower-triangular form of AT, 
and application of Gantmacher’s Theorem 6 proves the Corollary. 

For example, transposing (6.57) gives 


Bi, 0 BS, By, 
0 B3 B3, B}, 
0 0 BI, 0 
0 0 0 B, 


AT = (6.58) 
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Reversing the order of the rows and columns gives the block lower-triangular form 


Bi, 0 0 0 
0 BI, 0 0 
By, By, BY, 0 
By, By, 0 Bi, 


(6.59) 


The column-isolated blocks in A (B33 and B44) now appear as row-isolated blocks 
in AT. Gantmacher’s result shows that the eigenvector vı will be positive if and only 
if à; is an eigenvalue of each of those blocks. 

The usefulness of the Corollary follows from the population projection model 


n(t + 1) = An(t) n(0) = no (6.60) 
and its solution* 
AY 
n(t) = = ciAtw; (6.61) 
i=1 
Ss 
= J (ving) aw (6.62) 


Caswell (2001). If no is such that cy = ving is positive, then iM will eventually 
dominate all other terms in the solution and the population will grow at the rate A, 
with stable structure w;. We know the following about c1: 


1. If A is irreducible, then by the Perron-Frobenius theorem vj is strictly positive, 
so any non-negative, non-zero initial population no leads to a positive value of c1 
and eventual growth at the rate 1. 

2. If A is reducible and v; is strictly positive, any non-negative, non-zero no leads 
to a positive value of cı and growth at the rate A. 

3. If A is reducible and vı contains zero entries corresponding to a subspace R;, 
then initial conditions with positive support only in R; will lead to cı = 0, and 
A, will make no contribution to population growth from those initial vectors. 


In the first two cases, population growth is ergodic from any non-zero initial 
population. In the third case, there exists a basin of attraction leading to growth 
according to 41, and a basin (or basins) of attraction for growth according to the 
dominant eigenvalues of the diagonal blocks B;; corresponding to the zero entries 
of vy. 


4This holds provided that A is diagonalizable, which is a generic property for linear operators 
(Hirsch and Smale 1974, p. 157). 
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Part III 
Time-Varying and Stochastic Models 


Chapter 7 A 
Transient Population Dynamics TRICA 


7.1 Introduction 


Short-term, transient population dynamics can differ in important ways from long- 
term asymptotic dynamics. Just as perturbation analysis (sensitivity and elasticity) 
of the asymptotic growth rate reveals the effects of the vital rates on long- 
term growth (Chap. 3), the perturbation analysis of transient dynamics can reveal 
the determinants of short-term patterns. This chapter presents a comprehensive 
approach to transient sensitivity analysis that applies to linear time-invariant, time- 
varying, subsidized, stochastic, nonlinear, and spatial models. 

In a constant environment, once a population converges to its stable stage 
structure, it grows exponentially at a constant rate à. However, depending on initial 
conditions, short-term transient dynamics can differ from the asymptotic dynamics. 
It has long been recognized that a focus on à alone can obscure these important 
transient effects (e.g., Lotka 1939; Coale 1972). There have been attempts to develop 
transient sensitivity analyses using all the eigenvalues of the projection matrix 
(Fox and Gurevitch 2000), but these are complicated to calculate and limited in 
application. Matrix calculus allows us to do better (Caswell 2007). 


7.2 Time-Invariant Models 


Armed with matrix calculus, consider the linear time-invariant model, 


n(¢+1)=An(t) nO) = no, (7.1) 


Chapter 7 is modified, by permission of John Wiley and Sons, from Caswell, H. 2007. Sensitivity 
analysis of transient population dynamics. Ecology Letters 10:1-15. 
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where n is s x 1 and A is s x s; s the number of stages. Assume that A = A[0] 
depends on a p x 1 vector of parameters 0, which could be entries of A, lower-level 
parameters, or elements of the initial vector. 

The sequence of matrices 


dn(t) 
do" 


t=1,2,... (7.2) 


gives the effect of all the parameters on all the entries of n(t). From it we can 
calculate the sensitivities and elasticities of other dependent variables (Sect. 7.3). 
We differentiate the model (7.1), obtaining 


dn(t + 1) = A dn(t) + (dA) n(t), (7.3) 


and then apply the vec operator to both sides, remembering that since n is a vector, 
vecn = n, 


dn(t + 1) = Adn(t) + (n (t) Q L) dvec A. (7.4) 
Then the first identification theorem and the chain rule, from Eqs. (2.47) and (2.18), 
give the sensitivity of n(t + 1) to the elements of A, 


dat +1), da) f 
dvectA ae TA + (n'(@) BI). (7.5) 


The chain rule extends (7.5) to give the sensitivity to lower-level parameters, 


dn(t +1) _ dn(t + 1) dvec A 


do" dvec'A dé" 


dn(t) dvec A 
= A—_ H 8 L) —. 7.6 

g t (nO) @ Is) ae (7.6) 
Equations (7.5) and (7.6) are matrix difference equations in the sensitivities of n(t) 
to the elements of vec A or of 0. If we know dn(t)/d6" and n(t), we can calculate 
dn(t + 1)/d0" and n(t + 1) and continue this iteration to obtain the transient 
sensitivities at any time. If the parameters in 0 affect the vital rates but not the 


initial population, the appropriate initial condition for this iteration is 


dn(0) 
do" = 05x p- (7.7) 
If 6 affects only the initial population, then 
dn(0) 
g TL (1.8) 


gives the sensitivity of transient dynamics to a change in initial conditions. 
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7.3 Sensitivity of What? Choosing Dependent Variables 


The sensitivity of other dependent variables may be more interesting than that 
of n(t). In an early (and relatively crude) transient analysis, Caswell and Werner 
(1978) analyzed the transient dynamics of the plant teasel (Dipsacus sylvestris) in 
terms of rosette area at time t (which might affect resistance to invasion by later 
successional species) and cumulative seed production up to time t (which might 
affect colonization of new sites). For a weedy species like teasel, either of these 
dependent variables might be more relevant than the asymptotic growth rate. 

Here are some other biologically interesting dependent variables. They are easy 
to calculate from dn(t)/d0". 


1. Population density, as measured by a weighted sum of stage densities. Let c > 0 
be a weight vector. Then population density is N (t) = e'n(t). This includes total 
density (c = 1,, a vector of ones), the density of a subset of stages (c; = 1 for 
stages to be counted; c; = 0 otherwise), biomass (c; is the biomass of stage i), 
basal area, metabolic rate, etc. The sensitivity of N (t) is 


dN(t) daft) 


=C ‘ 79 
do" do" (13) 
2. Ratios measuring the relative abundances of different stages: 
a'n(t 
R(t) = ( ) (7.10) 
b'n(t) 


where a and b are weight vectors. Examples include the dependency ratio (in 
human demography, the ratio of the individuals below 15 or above 65 to those 
between 15 and 65), the sex ratio in a two-sex model, and the ratio of juveniles to 
adults, which is important in wildlife management (Williams et al. 2002; Skalski 
et al. 2005). The sensitivity of R(t) is 


dR(t) b'n(t)a’ — a'n(t)b' \ dn(t) 
i = (7.11) 
d0 bnt)? do 
3. Cumulative density up to a specified time, 
t 

Ci) =) nG), (7.12) 

i=0 

the sensitivity of which is 
dC(t) . dn(i) 

w = D2 a (7.13) 
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4. Average density over an interval, 


t2 
: 1 ] 
o 2) = —— YN, 
i=t} 


the sensitivity of which is 


= t 5 
dN(t,t2) 1 yoan 


do" ~ bi — ty * 
=f, 


5. Maximum (or minimum) density over an interval, 


M(t, 2) = max N(i). 
TERS 


(7.14) 


(7.15) 


(7.16) 


Let f be the time such that M (t1, t2) = N(f). Then, except in the unlikely event 


of ties, 


dM(t, tr) _ R: dn(f) 
dé do 


with a similar expression for the minimum. 
6. Variance in density over an interval ft) < t < ho, 


Viti, n)= 
h-t‘ 
=f, 


The sensitivity of V is 


dV(ti,t2) 2 


do" h-t | ¢ 


l 


2 Ae E 
= De (wo -= Nn, m) 
to — ti 


i=t} 
7. The transient population growth rate at time t, 


N(t+ 1) 


r(t) = log NO 


t2 
dara) — [Aa]. 


2 dN() - 
= NO F - NG) >) 
ar, do 


(7.17) 


(7.18) 


(7.19) 


(7.20) 


(7.21) 
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The sensitivity of r is 


dr(t) _ c’ dn(t + 1) c€ dn(t) (7.22) 
do’ N(t+1) do N(t) do" ` i 
8. Average growth rate over an interval ft) < t < fa, 
- N (t) 
r(t, bh) = o ; (7.23) 
h-t a N (tı) 
the sensitivity of which is 
dr(tı,t 1 T dm(t T dn(t 
r( le 2) c nt) c ni) l (7.24) 
dé ta —t, \ N(t2) dé N(t\) dé 


7.4 Elasticity Analysis 


Transient elasticities are easily calculated from the sensitivities. The elasticity of 
n;(t) to 0; is 


Eni _ 0; dnj(t) (7.25) 
€0; ni(t) d0j : : 


Creating a matrix of these elasticities requires multiplying column j of dn/d6" by 
0; and dividing row i by ni. This is just 


dn(t) 
do" 


D mO)! DIO], (7.26) 
where D [x] is a matrix with x on the diagonal and zeros elsewhere. The elasticity 
of any other (scalar- or vector-valued) dependent variable f (n(t)) is given by 


-1 df(a(@)) 


go P” (1.27) 


D [fa] 
Example: A transient outbreak: elasticity to lower-level parameters Consider 
a hypothetical size-classified population with 


0.3763 0 0.8431 8.4312 
A= 0.1939 0.5421 0 0 l (7.28) 
O 0.1177 0.5240 0 


0 O0 0.1291 0.5254 
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Fig. 7.1 Dynamics of a 
transient population outbreak. 
The projection matrix (7.28) 
has A = 0.92, but an initial 
condition of a single adult 
leads to a rapid outbreak that 
lasts for over 25 years 


Density 


The asymptotic growth rate calculated as the dominant eigenvalue of A is 
à = 0.92, so the population is headed for eventual decline. However, the initial 
condition 


no = (0001) (7.29) 


(introduction of a large adult) produces a dramatic transient outbreak (Fig. 7.1), 
during which total population increases by over 900% and remains above its initial 
value for about 25 years.! 

If this was a pest its asymptotic fate (extinction) would be reassuring, but 
à would reveal nothing about the transient outbreak. A manager might want to 
know how changes in the lower-level survival probabilities o;, growth probabil- 
ities y;, and fertilities f; would affect the outbreak, where the elements of A 
are 


Gi+1i = Oi Yi A (7.30) 


8 
ll 
Pa 
| 
R 
N 


If the impact of the pest was related to size, the manager might measure population 
density with weights, say c" = (1 23 4). Two measures of damage might be 
the maximum of the outbreak and the cumulative population size over the entire 
outbreak. Finally, to put everything on a proportional basis, the manager might want 
to use elasticities. 


'The curious reader may wish to know that A was obtained by a random search for size-classified 
matrices with high reactivity (Neubert and Caswell 1997; Caswell and Neubert 2005; Verdy and 
Caswell 2008). 
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Define 0 as the 9 x 1 vector whose entries are oj-04, y1—y3, and f3— f4. The 
derivatives dvec A/d6' are obtained from (7.30). The sensitivity of n(t) to changes 
in ô is given by (7.6). Using (7.9) and (7.27) we obtain the elasticity of N (t) to 0 as 


eN(t) 1 dat) 
e T NO do 


D (0). (7.31) 
The peak of the outbreak occurs at t = 2; thus (7.17) gives the elasticity of the peak 


density to 0 as 


eN(2)_ 1 ,dn(2) 
e T NO do 


D (0). (7.32) 


The cumulative density up to time ¢ is given by (7.12) and the sensitivity by (7.13), 
so the elasticity is 


t 


1 dn(i) 
T D (0). 7. 
Ep Kae a» 


Results are shown in Fig. 7.2. The elasticities of the maximum outbreak density are 
very different from those of à. The elasticity of the cumulative density over the first 
5 years has a similar pattern, also very different from that of à. However, by the end 
of the outbreak (25 years) the elasticity of cumulative density is quite similar to that 


Maximum density Cumulative density t=5 
1 1:5 
D 1 
aS) 
@ 0.5 
© 
iw 0.5 
0 0 
Cumulative density t=25 Asymptotic rate À 
4 0.4 
3 0.3 
2 
Ss 
iD 2 0.2 
© 
m 
1 0.1 
s1 s2 s3 s4 g1 g2 g3 f3 f4 0 s1 s2 s3 s4 g1 g2 g3 f3 f4 


Fig. 7.2 The elasticities of the maximum population density, of the cumulative densities up to 
t = 5 and t = 25, and of à to the lower-level demographic parameters, for the outbreak shown in 
Fig. 7.1 
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of à, so management over this time scale could reasonably rely on the elasticity of 
à to compare control tactics. Intermediate steps and MATLAB code are found in an 
appendix to Caswell (2007). a 


7.5 Sensitivity of Time-Varying Models 


Now consider the time-varying model 
n(t + 1) = A;yn(t) n(0) = no, (7.34) 


where A;, t = 1,..., T is a specified sequence of matrices. 
Take the differential of both sides of (7.34) 


dn(t + 1) = A;dn(t) + (dA;) n(Z), (7.35) 
and apply the vec operator to obtain 
dn(t + 1) = A;dn(t) + CKO) ® I,) (dvec A;) . (7.36) 


Not only the transient behavior of the population, but also the parameter vector 
0, the matrix A;, and the perturbation applied to 9 may change over time. The 
sensitivity analysis must reflect both types of variation. So, let us treat A; as a 
function of (t), and consider a perturbation of 0 at some time u. Applying the 
chain rule to (7.36), we obtain 


dvec A; 
d0"(u) 


dn(t + 1) =A dn(t) 


ww 7 wat (t) @Is) 


(7.37) 


which has the same form as (7.6) except that the matrix and the matrix derivative 
vary over time. 
Some useful simplifications follow from this formulation. 


1. Perturbation of matrix elements. If 0 (t) consists of the elements of vec A;, then 


dvec A; Zi (138) 
dO (t) = 452 x 


and can be eliminated from the expressions where it appears. 
2. No time travel. Suppose that 0 (t) is perturbed at some time t = u. Then 
dvec A; 
d0"(u) 


= 0,2 p fort <u (7.39) 


However, the effects of the perturbation continue after t = u, so that 
dn(t)/d6"(u) will generally be non-zero for t > u. 
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3. Perturbations at every time. A permanent modification of the parameters can be 
considered a perturbation of 0 (t) for every time t = 0, 1, ..., so that 


O(t) — O0(t) + d0. (7.40) 
The sensitivity of the population vector is then 


dvec A; 


dn(t + 1) -A dn(t) 
~ do" 


do" do" 


+ me) ® I,) 


(7.41) 


4. Perturbation over a range of times. One might be interested in perturbation over 
some time period T} < t < T2. The effect of such a perturbation on transient 
dynamics is 


dn(t + 1) dn(t) ş dvec A; 
a I,) J 7.42 
IO rao (n"(t) ORLO) O (7.42) 
where J (t) is an indicator variable 
17,<t<th 
J(t)= 7.43 
6) | 0 otherwise ue) 


These calculations have been extended to apply to population projections (Caswell 
and Sanchez Gassen 2015; Sanchez Gassen and Caswell 2018); see Sect. 7.8 below. 


7.6 Sensitivity of Subsidized Populations 


An interesting special case of time-varying models is that of subsidized populations 
(e.g., Pascual and Caswell 1991), which receive an input of individuals? 


n(t + 1) = A;n(t) + b(t). (7.44) 


The subsidy vector b(t) might represent immigration, or the introduction of 

individual animals from a captive release program, or dispersal of the larvae of 

marine invertebrates or the seeds of plants. If b(t) < 0, then it could represent 

the removal or harvest of individuals from the population (e.g., Hauser et al. 2006).° 
Differentiating gives: 


dvecA;  db(t) 
do" do" 


dat +1) _ , dni) 


aa iar t OSL) 


(7.45) 


?See Chap. 10 and Caswell (2008) for analysis of the equilibria of both linear and nonlinear 
versions of this equation, with applications to organizational dynamics and marine invertebrates. 


3This type of harvest is unstable in the long run, but we are dealing here with transient dynamics. 
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If 6 affects only the vital rates and not the subsidy process, then dn(t)/0" reduces 
to (7.37), and subsidy affects the sensitivity only through its effect on (n! (t) @ I,). 
On the other hand, setting 0 = b gives the effect of changes in the subsidy process: 


dnt+l)_, 


dn(t) 
db" Tie 


Tb (7.46) 


t 


Example: A subsidized model for the reintroduction of the Griffon vulture 
The griffon vulture (Gyps fulvus) was once widely distributed in Europe, but 
has been eliminated from many areas, due primarily to poisoning and shooting. 
A reintroduction program has re-established a population in the Massif Central 
of southern France; Sarrazin and Legendre (2000) have analyzed this program. 
Reintroduction programs are increasingly important in conservation biology (Sar- 
razin and Barbault 1996; Snyder and Snyder 2000), and will become an important 
application of subsidized models. Transient dynamics are naturally critical for 
evaluating reintroduction programs, because the programs are of finite duration 
and are evaluated by short-term measures of success at, or shortly after, their 
conclusion. 

In the case of the griffon vulture, birds can be introduced as juveniles or adults. 
Adults introduced from captivity have lower fertility and lower survival than wild 
adults. Here I use a simplification of the Sarrazin-Legendre model to show how 
transient sensitivity analysis could be used. The life cycle contains four age classes 
and a stage representing captive-reared adults (Fig. 7.3a). The survival of released 
adults is a fraction p of that of wild adults, and their fertility a fraction q of that 
of the wild adults. I assume these costs persist indefinitely; Sarrazin and Legendre 
(2000) explore both short- and long-term costs. Suppose that a manager is interested 
in the effects of the annual number bı of juveniles released, the number bs of 
adults released, and the relative survival p and relative fertility g of captive-reared 
adults. 

One measure of success will be the population size at the end of the introduction 
program. The best such population, in terms of future population size, would be one 
with the highest total reproductive value, N = v'n (also called the stable equivalent 
population; see Chapters 8—9 of Keyfitz and Caswell 2005). The elasticity of stable 
equivalent population size* is 


eN 1 ;dn(t) 
= DO thes? 7.47 
a m ao ee) 


where v is the reproductive value vector from A and 6" = (bı bs p q ). 


4The parameters under investigation here do not affect the reproductive value vector v. To analyze 
the sensitivity of stable equivalent population to, say, o;, would require the derivative of v as well; 
this is presented in Chap. 10. 
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Elasticity of stable equivalent population 
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Fig. 7.3 (a) The life cycle graph and (b) the transient elasticity of stable equivalent population size 
N(t) = v'n(t) to changes in juvenile introductions (bı), adult introductions (bs), adult survival 
costs (p), and adult fertility costs (q) for the Griffon vulture. Parameter values from Sarrazin and 
Legendre (2000); o; = 0.86, o, = 0.98, f = 0.33, p = 0.75, q = 0.51 


Using parameter values in Sarrazin and Legendre (2000) and setting b1 = bs 
(i.e., evaluating the value of juveniles and adults from a situation where they 
are introduced in equal numbers) gives the result in Fig. 7.3b, for an introduction 
program duration of up to 20 years. 

It is always better to increase the number of juveniles relative to the number 
of adults introduced. The benefits of reducing survival and fertility costs (i.e., 
increasing p or q) increases with the duration of the program, as they have longer 
times available to operate. Reductions in the survival cost would have more impact 
than reductions in the fertility costs. These results are strongly influenced by the 
fact that the reproductive value of captive-reared adults is lower than that of newly 
fledged or released juveniles, which is reflected in the high elasticity of N(t) to 
juvenile releases. E 


7.7 Sensitivity of Nonlinear Models 


In density- or frequency-dependent models, the vital rates depend on the parameters 
0 and current population density n(ż): 


n(t + 1) = A[0, n(t)] n(t). (7.48) 


Changes in 0 affect dynamics directly, through A, and indirectly, through n(t). The 
transient sensitivity of n(t) to parameter changes must include both effects. 

Differentiating both sides of (7.48) and applying the vec operator gives the 
familiar differential expression 


dn(t + 1) = A[@, n(t)]dn(t) + (n (t) Q Is) dvec AJO, n(t)]. (7.49) 
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But now, unlike in the linear case, dvec A includes both direct effects through 6 and 
indirect effects through n, so the total differential is 


avec A yg P dvec A ðn(t) 


AVONA ðn 00" 


dé. (7.50) 


Substituting (7.50) into (7.49) gives 


ene) Aen e 
d0 do 


i we) " I,) dvec A[0, n(t)] 


307 
dvec AÇO, d 
+ (n"(t) @ I) — a es (7.51) 


The first two terms are familiar from the density-independent case; the third term 
accounts for the effects of 0 on A through its effects on n(t). Rearranging terms 
gives the transient sensitivity, 


dat +1) _ 
da 


{ava n(t)] + (n"(1) @ Ly) eo a 


dvec A[0, n(t)] 


7 (7.52) 


F (n" (t)® Is) 
Example: Transient sensitivity of Tribolium Flour beetles of the genus Tribolium 
have been used for a series of models of, and experiments on, nonlinear dynamics, 
reviewed by Cushing et al. (2003). Tribolium lives in stored flour. Adults and 
larvae cannibalize eggs, and adults cannibalize pupae; these interactions provide the 
density-dependence, and are captured in a three-stage (larvae, pupae, and adults) 
model, with 


0 0 bexp(—Cejn1 — Cean3) 
A[0, n] = | 1— py 0 0 (7.53) 
O  exp(—Cpa)n3 1 — Ha 


where b is the clutch size, Cea, Cel, and Cpa are cannibalism rates (of eggs by adults, 
eggs by larvae, and pupae by adults), and u; and 44a are larval and adult mortalities. 
Parameter values from experiments reported by Costantino et al. (1997) give the 
transient dynamics in Fig. 7.4, following introduction of a single adult. 

The sensitivity of this transient behavior requires the derivatives of A[0, n] to the 
parameters and to the densities. Substituting these derivatives into (7.52) gives the 
transient sensitivities by a simple iteration. The derivative matrices are given in an 
appendix to Caswell (2007). 
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Fig. 7.4 (a) The transient dynamics of the Tribolium model following introduction of a single 
adult. Parameters from Costantino et al. (1997). (b) The transient elasticity of the metabolic 
population size N,,(t) to each of the parameters of the Tribolium model, for the first 20 time 
steps following the introduction of a single adult 


Tribolium is a pest. The damage it causes might, I suppose, be related to its 
consumption, which might be measured by the metabolic rate. Emekci et al. (2001) 
estimated the per capita metabolic rate of larvae, pupae, and adults. Using their 
results, we define the metabolic population size as Nm(t) = c'n(t) where c = 
(9 1 4.5) ul CO2 h™!. The elasticities of Nin (t) to the parameters are 


€Nm _ 1 eo 
cO Nat) dO 


D (0). (7.54) 


fort = 1,...,20. 

The results are shown in Fig. 7.4. For the first 5 or so iterations, Nm is more 
elastic to the clutch size than to the cannibalism or mortality rates. After that, 
the impact of b declines and the impact (negative) of the cannibalism coefficients 
increases. Beyond 10 time steps, Nm is affected primarily by b (positively) and 
Cea (negatively). Changes in mortality (4a and u) have only small effects. Such 
changes in the relative impact of the parameters over short periods of time are 
typical of transient sensitivities. Interestingly, the elasticities of total population size 
Nio = }_ n; (not shown) show a similar pattern, but lack the period-2 fluctuation 
evident in Fig. 7.4. This reflects the interaction of the weighting pattern (much 
more uneven in the calculation of Nm than Nor) and transient fluctuations in 
the stage distribution. Asymptotic sensitivity calculations are unaffected by such 
differences. 

The parameter values used here lead to a stable equilibrium, but the transient 
calculations apply equally to other types of dynamics. E 
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7.8 Sensitivity of Population Projections 


The most common transient analyses of populations appear in the population 
projections provided by local, national, and international offices. These projections 
are usually carried out by the cohort component method, which uses mortality, 
fertility, and migration to describe the dynamics of each agexsex combination. 
The calculations are transient because the begin with the current, rather than an 
asymptotic, age-sex distribution and are carried out over a short time horizon 
(usually a few decades). In the first issue of the first volume of the then-new 
journal Demography, Nathan Keyfitz described the “population projection as a 
matrix operator” (Keyfitz 1964). He showed that population projections using 
the cohort component method could be written as matrix population models, and 
emphasized the value in doing so to focus attention on the mathematical structure 
of the projection, inviting deeper analyses of its properties with more powerful 
mathematical tools. Considering projections as matrix operators allows the use of 
matrix calculus methods to develop a thorough perturbation analysis of population 
projections (Caswell and Sanchez Gassen 2015; Sanchez Gassen and Caswell 
2018). 

To present the basics of projection sensitivity analysis, we begin with a simple 
one-sex model, but we focus most of our attention on a two-sex model that includes 
separate rates for males and females. 

The single-sex projection can be written as 


n(¢+1)=A()n(t)+ b(t) nO) = no (7.55) 


where n(t) is a vector whose entries are the numbers of individuals in each age class 
or stage at time t, A(t) is a projection matrix incorporating the vital rates at time f, 
and b(t) is a vector giving the number of immigrants in each age class or stage at 
time ft. The projection begins with a specified initial condition, denoted ng, and is 
carried out until some target time T. 

To develop a two-sex projection, we define population vectors nf and nm, and 
projection matrices Ay and Am, for females and males, respectively. We assume 
that reproduction is female dominant,’ so all fertility is attributed to females. We 
decompose the projection matrices for females and males into 


Ay) = Us) + FO (7.56) 
Am(t) = Um (t) (7.57) 


where U describes transitions and survival of extant individuals and F describes the 
production of new individuals by reproduction. 


5Two-sex models that do not assume dominance by one sex have been used to project animal 
populations, but not, as far as I know, human populations (e.g., Jenouvrier et al. 2009, 2010, 2012). 
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In an age-classified model, F will have effective fertilities (including infant and 
maternal survival as appropriate) on the first row and zeros elsewhere. A proportion 
¢ of the offspring are female. This model attributes reproduction to females; hence 
there is no need to create separate fertility matrices for reproduction by males and 
females. 

The male component of the population is projected by the survival matrix Um; the 
input of new individuals comes from the female population. The projection model 
becomes 


ny(t-+1) = [Up + gFO| nyt) + by) (7.58) 
Nn (t + 1) = Un Enn (t) + A — HFE) + bn) (7.59) 
The sensitivity of the two-sex projection is given by the two derivatives, 


dn f(t) ad dnm (t) 
d0" (u) d0 (u) 


These sensitivities are obtained from dynamic expressions, for the female 
population 


Tei z (vo H oF) R (1081) (S ' Fa) 


d0"(u) d0"(u) ` dou) `” d'u) 
—— ~ sm _————— ——_— 
sensitivity at t + 1 sensitivity at t effects via female transitions and fertility 


db ¢ (t) 


do (7.60) 
effects via immigration 
and the male population 
dnm(t + 1) dN) (t) ; dn (t) 7 dvec Un (t) 
10 (u) = Un(t) WO") * a DEO ToT) + (1, (t) ® Io) aaa 
—[— = —— ee 
sensitivity atr+1 sensitivities at t effects via male transitions 
dvec F(t) dbm (t) 
— T — 
+0- 4) (n7) 2 I.) wa ae (7.61) 
a ) — 
effects via female fertility effects via immigration 
Equations (7.60) and (7.61) are iterated from initial conditions 
dnp (0 dnm (0 
ral ) T m( ) = Oise (7.62) 


do"(u) dO" (u) 
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along with the iteration of equations (7.58) and (7.59) for the population vectors 
nyp (t) and nm (t). For complete details, see Caswell and Sanchez Gassen (2015). 

The terms in (7.61) are labelled to show how the processes of transitions, fertility 
and migration, for males and females, combine to produce sensitivity of a transient 
population. As before, the sensitivity at £ + 1 depends on the sensitivity at time t 
and on the effects of the parameter vector on the transition and fertility matrices and 
on the immigration vector. In the next section we turn to the calculation of these 
derivatives. 

The elasticities of n p (t) are given by 


a = [apo] are) D go (7.63) 


with a similar expression for nn. 

Caswell and Sanchez Gassen (2015) present a detailed analysis of a projection 
for the population of Spain, published by the Instituto Nacional de Estadistica (INE), 
for the years 2012-2052. They calculated the sensitivity and elasticity of total 
population, male and female population, the school age population (6-16 years), 
the part of the population expected to suffer from dementia, and the dependency 
and support ratios. All these outcomes are calculated from the basic projection 
using the methods in Sect. 7.3. In a more extensive comparison, Sanchez Gassen 
and Caswell (2018) have applied the approach to the Europop2013 projections for 
the 28 member states of the European Union, plus Iceland, Norway, and Sweden, 
for the years 2013—2080. 


7.9 Discussion 


In addition to their obvious role in population projections, transient effects are 
critically important in studies of climate change and other short term management 
issues (Ezard et al. 2010). A recent study found that simulations of invasive species 
were strongly influenced by transient effects (Muthukrishnan et al. 2018). Matrix 
calculus makes transient sensitivity analysis straightforward and applicable to a 
wide range of models and perturbations. The approach calculates sensitivities and 
elasticities as a dynamic system, iterated in parallel with the dynamics of the 
transient solution itself. 

This dynamic approach reveals the fundamental structure underlying the sensi- 
tivity calculation. The results bear a striking family resemblance, from the linear, 
time-invariant case (7.6), to the time-varying case (7.41), the case of subsidized 
populations (7.45), the nonlinear case (7.52), and the time-varying, two-sex, subsi- 
dized model that forms the basis for the cohort component method of population 
projection in equations (7.61) and (7.60). 

The examples here sound like stories—suppose that someone (e.g., a manager) 
is interested in some aspect of the population (e.g., its total size, or variance, or 
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average growth,...) over some time interval. Or suppose that mortality, fertility, 
and immigration develop in the following way. This emphasizes the flexibility of 
this approach, and also the importance about thinking clearly about the dependent 
variables and time scales of interest. The list of dependent variables in Sect. 7.3 
can no doubt be extended. It may be repeating the obvious, but transient sensitivity 
analysis depends on initial conditions. Each of the examples had to choose an initial 
condition and argue for its relevance. 

Section 10.2.6 in Chap. 10 briefly considers the sensitivity analysis of equilibria 
to continuous-time systems. Richard et al. (2015) have developed a very general 
sensitivity analysis of transient dynamics in continuous systems (both linear and 
nonlinear). They point out and nicely demonstrate the parallels between continuous- 
time models and the discrete-time models considered here, the link being the 
creation of a dynamic model for the sensitivities that is solved along with the 
dynamics of the system itself. 
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Chapter 8 A 
Periodic Models FEEN 


8.1 Introduction 


Periodic matrix models are often used to study cyclical temporal variation (seasonal 
or interannual), sometimes as a (perhaps crude) approximation to stochastic models. 
However, formally periodic models also appear when multiple processes (e.g., 
demography and dispersal) operate within a single projection interval. The models 
take the form of periodic matrix products. A familiar example is when population 
projection over an annual interval is described as a product of seasonal operators. 
The perturbation analysis of periodic models (Caswell and Trevisan 1994; Lesnoff 
et al. 2003; Caswell and Shyu 2012) must specify both the vital rates affected by 
the perturbation and the timing of the perturbation within the cycle. This chapter 
presents a general approach to the perturbation analysis of both linear and nonlinear 
periodic models. The results consist of a series of analyses of some of the most 
commonly encountered periodic models. 

If the environment is time-invariant on the scale of a chosen projection interval 
(e.g., from year to year), the result is a periodic matrix population model in which 
the seasonal product repeats itself. Such a model can be written as 


n(t + 1) = B, -- - BoBin(t) (8.1) 


Here, B; is the matrix at phase i of the cycle and p is the period. The period is the 
number of phases in the cycle; i.e., the number of matrices in the periodic matrix 
product in (8.1). Neither the identities nor the number of stages need be the same 
from one phase to the next, so the matrices B; may be rectangular rather than square. 


Chapter 8 is modified, under the terms of a Journal Publishing Agreement with Elsevier Publishers, 
from: Caswell, H. and E. Shyu. 2012. Sensitivity analysis of periodic matrix population models. 
Theoretical Population Biology. 82:329-339. ©Elsevier. 
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The phases need not be the same length, so the period may or may not be measured 
in units of time. For example, in the model of Pico et al. (2002), each season is of 
2 months duration, and the period (p = 6) corresponds directly to a time scale. In 
contrast, the model of Hunter and Caswell (2005a) has three phases, with durations 
of 3 weeks, 5 weeks, and 10 months, respectively. The period (p = 3) of that model 
does not correspond to a time scale, but it identifies the number of matrices in the 
periodic product and appears in calculations in the same role as p = 6 in the model 
of Pico et al. (2002). 
The projection matrix over the entire periodic cycle is! 


A =B,--- BoB (8.2) 


The earliest studies of periodic matrix models were due to Darwin and Williams 
(1964), Skellam (1966), and MacArthur (1968). In recent years, with little fanfare, 
periodic models have emerged as an important tool for incorporating multiple 
processes within a single projection interval. Uses of periodic models include the 
following. 


1. Seasonal variation. Plants and animals experience obvious and dramatic seasonal 
variation in their demographic rates. Periodic models have been used to describe 
this variation, with seasons variously defined in terms of monthly periods, 
calendar seasons, or in terms of environmental events such as rainfall or flood 
patterns (e.g., Smith et al. 2005). 

Although annual or near-annual species are obvious candidates for periodic 
models, within-year time scales may also be important for long-lived species. For 
example, Hunter and Caswell (2005a) incorporated chick development events on 
a time scale of weeks into a periodic model for the sooty shearwater, which has a 
lifetime of decades. Similarly, Jenouvrier et al. (2010, 2014) have used periodic 
models to capture the timing of events in the breeding cycle within a portion of 
the year in the long-lived emperor penguin. 


‘Although we will not address it in this chapter, the model (8.1) can be written in a way that 
explicitly defines the starting phase in the cycle. As written, A in (8.2) projects from phase 1 to 
phase 1; if desired we could write this as A; and define matrices 


A> = B,B,---Bo 


A, = By-1:-- BiB, 


The A; are obtained by cyclic permutations of the sequence {B}, ..., B1}; each of these projects 
from a different phase in the cycle. Some demographic properties (e.g., the population growth rate 
A) are invariant with respect to such permutations; others (e.g., the eigenvectors) are not (Caswell 
2001). In this chapter, we will start with phase 1 and refer to A rather than A1. 
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2. Periodic interannual variability. Periodic models based on sequences of annual 
observations have been used to study effects of inter-event intervals, where events 
include fires, floods, ENSO events, etc. 

3. Harvest and management. These activities often take place at specified points 
within an annual or interannual cycle. Periodic models have been used to study 
the effects of their timing; one of the earliest periodic models being devoted to 
seasonal harvesting (Darwin and Williams 1964). 

4. Conditional probabilities. Periodic matrix products appear when models are 
written as products of conditional probabilities. In stage-classified models, for 
example, a transition matrix U, is written as the product of a diagonal matrix 
£ (with survival probabilities ø on the diagonal) and a matrix G of transition 
probabilities conditional on survival: 


U=Gr (8.3) 


which creates a period-2 periodic matrix product within the model. 

5. Multistate vec-permutation models. When individuals are classified by two 
or more criteria (e.g., stage and location), the dynamics over the projection 
interval can be described in terms of the processes affecting each criterion (e.g., 
transitions and movement). The result is a periodic model that uses the vec- 
permutation matrix to generate a block-structured projection matrix over the 
entire interval. See Chap. 6 for analysis of such models. 

6. Nonlinear models. Henson and Cushing (1997) developed a model for Tribolium 
in an experimental system in which container size was varied periodically. Shyu 
et al. (2013) developed a nonlinear seasonal model of an invasive plant to account 
for the timing of both density effects and management actions within the year. 
In such models, cyclic dynamics can be produced both by the environmental 
periodicity and the nonlinearities (e.g., Cushing 2006). 


8.1.1 Perturbation Analysis 


As in Fig. 8.1, we suppose that in phase i of the cycle, the parameter vector takes on 
the value ð; and determines the matrix B;. The projection matrix A is the product, 
in the specified order, of the B;. Although the output £ is calculated from A, the 
parameter dependence operates through the B; (Fig. 8.1). The sensitivity of & to the 
elements of A is in general not of interest, because those elements are complicated 
expressions involving the elements of all the B;, and thus mix disparate biological 
processes. Here we calculate the sensitivity of demographic outcomes to the entries 
of the B;. 

In this chapter we analyze linear periodic models of the form (8.1) and the 
cyclic dynamics of nonlinear seasonal models with delayed density effects. We 
will briefly discuss the generalization of the multistate age x stage-classified models 
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Table 8.1 Table of symbols in this chapter 


8 Periodic Models 


Symbol Meaning 

Si Number of stages at phase i of the cycle 

p Period of the cycle 

q Dimension of parameter vector 0 

r Number of locations in spatial model 

0; Parameter vector evaluated at phase i 

B; Projection matrix from phase i to phase i + 1, or in location i 
C! Ordered product B; - - - B; of matrices from i to j 
M;i Dispersal matrix for stage i 

A Projection matrix over entire cycle 

Aj Projection matrix over cycle, starting at phase i 

R; Matrix of LTRE contributions from phase i 

Esi s X s matrix with 1 in (i, i) position and 0 elsewhere 
I, Identity matrix of dimension s 

D (x) Diagonal matrix with x on the diagonal 

1 Vector of ones 

B, M, etc. | Block-structured matrices 

o Hadamard, or element-by-element product 

& Kronecker product 


A 
— 
Op Bp[9p] 
Parameter Parameters Matrices Matrix over 
vector at each phase at each phase entire projection 


8; —~ B1[64] 


cycle 


Output variable 


Fig. 8.1 A vector 0 of parameters determines an output variable £, which may be a scalar, vector, 
or matrix. The parameter vector will generally take on different values at each phase in the cycle, 
and determine the phase-specific matrix B;. These matrices determine the projection matrix A as 
a periodic matrix product; the output variable is computed from A. The perturbation problem is to 
compute the sensitivity or elasticity of £ to 0 


explored in Chap. 6 to an arbitrary number of classifications. We extend the LTRE 
decomposition analysis to the periodic case, making it possible to analyze effects of 
parameter changes at any point in a periodic environment. 
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8.2 Linear Models 


Consider the basic model (8.1) with projection matrix (8.2). The period of the cycle 
is p. To allow for differences in the state vector at different phases within the cycle, 
define the number of stages at phase i as s;. Thus the matrix B; is of dimension 
Si+1 X Si, with the subscript i interpreted mod(p) (that is, (p + 1) mod(p) = 1). 
Let £ (dimension m x 1) denote an output variable calculated from A, where 
& might be a scalar, a vector, or a vectorized matrix. Let 0 be a parameter vector 
(dimension q x 1). The derivative of £ with respect to 6 is the m x q matrix 


dg (déi 
do” \ do; 


) i=l,...,m; j=l,....q (8.4) 


By the chain rule, the effects of the parameters on & are captured in the matrix 
product 


dé dg dvecA 
do’ dvec’A do’ ` (8.5) 


The first term in (8.5) is the derivative of the output variable £ with respect to the 
matrix A from which it is calculated. The second term in (8.5) is the derivative of 
the periodic product matrix A with respect to the parameter vector 6. To obtain this, 
differentiate (8.2), to obtain 


dA = By--- B2 (dB)) 
+B, --- (dB2) Bi 


+ (dB,) Bp-1 aa -Bı (8.6) 


It is convenient to define the matrix cÍ as the ordered product (from right to left) of 
the B matrices from i up to j: 


Cİ=B; B; i<j (8.7) 
and set c? = Ch = I,,. Then (8.6) becomes 
dA = CË (dB1) + CÈ (dB2) C}--- + (dB,) CP" (8.8) 


Applying the vec operator to both sides gives 


dvecA = 3 (c) @ Chil dvec B; (8.9) 
i=l 
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Equation (8.9) accounts automatically for the possibly different dimensions of the 
B;. The resulting derivative with respect to the parameter vector 0 is 


Ser = Ulery oct] So oD 


i= 


an 


where dvec B; /d6" is the derivative of the matrix B; with respect to the parameter 
vector 0, evaluated at 0;. Equation (8.10) sums the contributions of the derivatives 
of all of the phase-specific matrices B; with respect to 0, thus accounting for all the 
ways in which 0 may affect the demographic rates at each point in the cycle. As 
written, (8.10) gives the result of perturbing 0 at each point in the cycle. The effect 
of a phase-specific perturbation is easily obtained by summing only over phases in 
which 6; is modified. 

Substituting (8.10) into the formula (8.5) gives the general expression for the 
sensitivity of £ to changes affecting any or all of the B;: 


dé dé 2 1\7 ap 7 dvecB; 
do" dvec™A (> (ci ) 8 Cha] dð J’ n 
i= 
The elasticity of £ to 0 is the matrix 
es Oj déi 
O (Z dð; (31a 


P T , 
SDO ai (£ (er!) ect] an) @) (8.13) 
1 


i= 


where D (x) is a diagonal matrix with x on the diagonal and zeros elsewhere. 
Because elasticities are logarithmic derivatives, they apply only when € > 0 and 
0>0. 


8.2.1 A Simple Harvest Model 


The projection matrix for a simple harvest model (e.g., Hauser et al. 2006) can be 
written 


A=B(-H). (8.14) 


The matrix B describes demography in the absence of harvest. The matrix H = 
D (h) is a harvest matrix, where h; is the probability that an individual of stage i 
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is harvested.” Either B, H, or both may be functions of a vector 0 of parameters. 
Differentiating (8.14) and applying the vec operator gives 


dvecA = — (I; ® B) dvecH + ja —H)'® I, | dvec B. (8.15) 
The diagonal matrix H can be written 
H = I, o (1sh") (8.16) 


where 1, is a s x 1 vector of ones and o denotes the Hadamard product. The 
differential of H in (8.16) is 


dvecH = D (vecI,) (I; Q 1,) dh. (8.17) 


Combining (8.16) and (8.15) and applying the chain rule gives the derivative with 
respect to 0: 


dvecA dh = dvecB 
a T (I; ® B) D (vec Is) s 8 1) 107 +[4- B" 81] o | 
——_[$Ss—————— 
perturbations of h perturbations of B 


(8.18) 
The conditional probability model (8.3) has the same form as the harvest 
model (8.14), so a similar analysis applies to it as well: 


dvecU = (I & G) D (vecD 18 1)do + (2 @D dvecG. (8.19) 


However, the conditional transition matrix G is column-stochastic (all columns 
sum to 1), because all loss of individuals is accounted for by X. Thus relevant 
perturbations must be parameterized so that the stochasticity is preserved. For 
example, if G describes growth in the standard size-classified model (Caswell 2001, 
Section 4.2), e.g., 


1- yı 0 0 
G= yı 1— y2 0 (8.20) 
0 yr l-7 
then perturbations of the y; will preserve stochasticity of G. If G has no such 


convenient parameterization, then changes in the entries of G must be compensated 
for by changes elsewhere in the same column (see Caswell 2001; Hill et al. 2004; 


? Alternatively, let u; be the mortality due to harvest experienced by an individual in stage i. Then 
H = exp[—D (q)]. Harvest imposes an additional, additive hazard on top of the natural mortality 
contained in B. 
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Theorem 4.5 of Caswell 2013). For explicit formulas for compensation, see Chap. 11 
of this volume; for an application, see van Daalen and Caswell (2017). 

The harvest model (8.14) can be extended to describe harvest imposed at a 
specified phase within a p-cycle. Suppose that harvest takes place between phase 
m and phase m + 1, so that 


A=B,---Bn+1 A-H) Bn--- Bi (8.21) 


(see Darwin and Williams (1964) for an early example of this kind of seasonal 
harvest model). Using the same approach, it can be shown that 


dvecA i dvec H 
do" = (ci y @ Charl do" 


+ [hs och 4-15] D [(C) ocr] S 


i=1 


+[(@-H C7)’ @hy] es [(cah)' 8 cta] ar 
=m+1 


(8.22) 


The expression (8.17) can be substituted for dvecH in (8.22), and the resulting 
expression for dvec A/d6" substituted into (8.5). 


8.3 Multistate Models 


We have encountered several examples of models in which individuals are classified 
by two criteria (age and stage, stage and environmental state, stage and location, 
etc.). These multistate models can be constructed by the vec-permutation matrix 
approach; see Chaps.5 and 6 or Hunter and Caswell (2005b) and Caswell et al. 
(2018). 

Suppose individuals classified by two criteria; e.g., stages (1,..., 5) and loca- 
tions (1,...,7). One might describe population dynamics in terms of stage transi- 
tions within locations, and spatial movement within stages, with the two processes 
acting sequentially. Thus individuals first survive and reproduce according to their 
stage-specific demography, and then disperse among locations, and then repeat. Let 
B; be the s x s matrix describing transitions and reproduction within location 7, and 
Mj the r x r matrix describing movement probabilities for stage j. Let B and M 
be the sr x sr block diagonal matrices with the B; and the Mj, respectively, on the 
diagonal. 

The population is projected by 


n(t + 1) = K"MKB n(1).. (8.23) 
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The matrix K is the vec-permutation matrix, or commutation matrix, (Henderson 
and Searle 1981; Magnus and Neudecker 1979), which satisfies 


vec N = K vec M (8.24) 
For the calculation of K, see Sect. 2.2.3. 


The model (8.23) is formally periodic, with the operation of B and M alternating; 
thus the projection matrix is 


A = K'MKB. (8.25) 


The dependence of A on the parameters 0 can take place through B; [0], M; [0], or 
both. 

The general sensitivity formula (8.5) requires the derivative dA/d6". Differenti- 
ating (8.25) gives 


dA = K' (dM) KB + K'MK (dB). (8.26) 
Applying the vec operator gives 
dvecA = ( TK Q K’) dvecM + (Isp Q K"MK) dvec (8.27) 


We want to express dvec B and dM in terms of the derivatives of their diagonal 
entries B; and M;. This can be done using equations (14) and (15) of Caswell and 
van Daalen (2016). Define the matrices P; and Q;, of dimension rs x s ands x rs, 
respectively, 


Os(i—1) xs 
P; = I; Q; = (05x@—1s I; Osxe-is ) . (8.28) 
05(r-i) xs 
Then 
, 
dvecB = X (Q} @ Pj) dvecB;. (8.29) 
i=l 


Similarly, for M, define matrices R; and S; 


0-G-1)xr 
R; = I, S; = (UrxG¢=1)7 I 0x- )- (8.30) 
0-(s—i) xr 
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Then 
Ss 
dvecM = J (S} & Rj) vec M;. (8.31) 
i=l 


Substituting (8.29) and (8.31) into the expression (8.27) for dvec A gives the final 
result 


dvecA dvecM ; ) ee B; 
a n 2 (s; oR i) gr ee Le n-ga 8.32) 
perturbations of the M; perturbations of the B; 


where X4 and X> are constant matrices, 


Xı = (B'K' @ K’) (8.33) 
X2 = (L @ K'MK) (8.34) 


that need be calculated only once. Although X;, X2, and the Kronecker products 
appearing in the summations are large, they are also extremely sparse. The sparse 
matrix capabilities in MATLAB can take advantage of this fact. Substituting (8.32) 
into (8.5) gives the sensitivity of an output variable £ to changes in parameters that 
perturb any or all of the M; and B;. 


8.4 Nonlinear Models and Delayed Density Dependence 


Anticipating the more extensive treatment in Chap. 10, we consider the effects 
of nonlinearity in periodic models. You may want to return to this section after 
Sect. 10.7, which analyzes periodic oscillations arising from time-invariant non- 
linearities. When periodic environmental changes interact with such oscillations, 
the results can be complicated, and such interactions are the focus of the present 
section. 

In a periodic nonlinear model, each of the B; in (8.2) may depend on density. 
Especially in seasonal models, the vital rates in the matrix B; may depend on 
densities not only at phase 7, but at previous phases within the cycle as well. For 
example, in a study of the invasive plant garlic mustard (Alliaria petiolata) Shyu 
et al. (2013) found that seed production of fruiting plants in the fall reflected the 
density experienced by vegetative rosettes in the early spring. 

To develop a model including such delayed density dependence, define 


n; (t) = population at season i in year t (8.35) 
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Starting at season 1, the dynamics are given by 


ni(t + 1) = Bpnp(t) 
no(t) = Bını (t) 


(8.36) 
np(t) = Bp-1ınp-1 (t) 


Density-dependence, in a general form, means that the matrices B; may be functions 
of densities over one cycle prior to season i: 


Bı = Bı [n 0), npt — 1), ..., m(t — 1)] 
Bo = B2 [mo(t), n,(t), np(t —1),...,m(t—- 1)] 


(8.37) 
B, = B, [np (6), np-1 (t), .-., m(9)] 


A fixed point on the interannual time scale, from ¢ to t + 1, is a p-cycle on the 
seasonal scale, satisfying 


hi = B, [fi, fp] fp 
ho = B; [âi âp] ny 

(8.38) 
fip = By-1 [fi, aay Âp] Ap—1 


A k-cycle on the interannual time scale is a kp-cycle on the seasonal time scale, the 
points of which are numbered hy, . . . , Ak». The corresponding sequence of matrices, 
in which the annual cycle B4, ..., Bp is repeated k times, is defined as B4, . . . , Bxp. 
With this notation, (8.38) still holds, with kp instead of p entries. 

Differentiating (8.38) yields 


dû; = (dBi—1) fj—1 + Bi_-1 (dû;—1) kalesa P (8.39) 


where the subscripts on ñ and B are interpreted modulo p. Applying the vec operator 
to (8.39) yields 


dû; = (f}_, Q Is) dvec B;—1 + B;-ıdû;—1. (8.40) 
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The sensitivity analysis of the cycle involves a set of block-structured matrices, 
the form of which is easily generalized from the special case with p = 3. Assuming 
p = 3 and noting that B depends on all the n; as well as on the parameter vector 0, 
the total differential of B;—1 in (8.40) is 


dvecB Əvec Bi- sA + avec Bii rå Əvec Bis igs Əvec Bi—ı jø 
vec Bj_| = n n 
a nm mn mM a0" 
(8.41) 
For notational convenience, define the matrices 
H; = (ñ; @ Is) i=1l,...,p (8.42) 


Substituting (8.41) into (8.40) produces the set of equations 


dvec B3 p dvec B3 ,, 
di, = H3— 40 HD ant dû; + B3diz 
n dvec Bı p dvec Bı n x 
din = Hı — oe ant dij + Bid (8.43) 


This set of equations can be reduced to a single equation by collecting all the points 
on the kp-cycle into a single vector. Write an array (of dimension sp x k) 


yr. 1 yr. k 
season 1| hy --: hy 
N= , : : (8.44) 
season p Ap --- Ap 


Then write the vector (of dimension spk x 1) 
N = vec N (8.45) 


In terms of this vector, the set of equations (8.43) can be rewritten 


dN 
de" 


1 


a [Lp - B- HC] 


HD. (8.46) 
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where H and B are the block-circulant matrices 


0 0 Hy 
H={H,; 0 0 (8.47) 
0 H; 0 


={B, 0 0 |], (8.48) 


and C and D are the block matrices 


dvec Bı dvec Bı 
dn} on} 
C= ; z9 : (8.49) 
dvec B3 dvec B3 
aaj o a 


dvec Bı 
a0" 
dvec B2 
= p 8.50 
a0" Á ) 
dvec B3 


30" 


All the derivatives are evaluated at nj, ... , 13. 


8.4.1 Averages 


The vector dN/d6" created by (8.46) contains the sensitivities of all s stages, at each 
of p seasons within the year, for each of the k years within the inter-annual k-cycle. 
If this is too much information, one can calculate the sensitivity of averages, or other 
linear combinations, taken in various ways. 

To write these averages, let b,, be am x 1 vector of weights. For a simple 
average of m quantities, each entry of bm is 1/m; for a weighted average, the 
entries of bm would be non-negative numbers summing to 1. More generally, b may 
contain arbitrary weights, such as biomass, metabolic rate, economic value, etc. See 
Chaps. 7 and 10. To calculate averages from N, first apply these vectors to average 
over rows or columns of M and then apply the vec operator to express the results as 
averages over N. 
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Annual fixed point If the dynamics are a fixed point on the annual time scale, 
averages can be calculated over stages (using a vector bs), over seasons (using a 


vector bp), or both. The p x 1 vector of averages over stages is 


avg. over stages = vec (b{ M) (8.51) 
= (I, @b')N (8.52) 


The s x 1 vector of averages over seasons is 
avg. over seasons = (b;, Q L) N (8.53) 
The average over both seasons and stages (a scalar) is 
avg. over stages and seasons = (b;, Q bs) N (8.54) 


Because the average is a linear operator, the sensitivities of these averages are 
obtained by applying the same weights to the derivative dN/d6" in (8.46): 


TE n dN 
sensitivity of avg. over stages = (I, ® b;) 76" (8.55) 
ae r dN 
sensitivity of avg. over seasons = (b p8 1,) ie" (8.56) 
ee z +\ aN 
sensitivity of avg. over both = (v; & bs) P7 (8.57) 


Annual k-cycle When the dynamics produce a k-cycle on the annual time scale, 
averages can be calculated over any desired combination of stages, seasons, and 
years. Table 8.2 gives the resulting expressions for the averages. As in the case of 
equations (8.55) and (8.56), the sensitivities of these averages to parameters are 
obtained by applying the same weights to dN/d0". 


8.4.2 A Nonlinear Example 


As an example of the calculations for nonlinear systems, imagine an organism 
with two stages: immature juveniles and reproducing adults. Suppose that the year 
contains two seasons: a benign, reproduction-heavy Season 1 and a harsh, mortality- 
heavy Season 2. The life cycle graph is shown in Fig. 8.2. Adults in Season 1 all 
survive to Season 2 and give birth to new juveniles with per-capita fertility f, which 
depends on adult density in Season 1 according to f [nı] = ae~°”2, where a and b 
are the maximum fertility and the strength of density-dependence, respectively, and 
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Table 8.2 Calculation of averages of attractors of nonlinear periodic matrix population models. 
The upper half of the table shows averages over stages and over seasons when the dynamics are 
a fixed point on the inter-annual time scale, and thus a p-cycle on the seasonal time scale. The 
lower half of the table shows averages over all combinations of stages, seasons, and years, when 
the dynamics are a k-cycle on the inter-annual time scale, and thus a kp-cycle on the seasonal time 
scale 


Average over Formula Vectors Dimension 
Stages 1 pxl 
Seasons 1 sxl 
Seasons and stages br 8 bs 1 

Stages (k 8I, 8 by) N 1 kp x 1 
Seasons (u Q b, QI s) N k sxl 
Years (b: 8 I; @1,) N p sxl 
Seasons and years (bi ® bi, Q&I s) N 1 sxl 
Seasons and stages (1 Ob, Q b7) N 1 kx 
Years and stages (b; 8I, 8b, T) N 1 pxl 
Stages, seasons, years (v 8b, 8 br) N 1 1x1 


Fig. 8.2 A periodic life cycle 
graph for a simple two-stage, 
two-season nonlinear model. 
J and A denote juveniles and 
adults, respectively 


Season 1 


n2 is the adult density in Season 1. In the harsher Season 2, juveniles and adults 
survive with probabilities sj and są. A juvenile that survives to Season | matures 
into an adult. 

This life cycle produces seasonal transition matrices Bı and Bo: 


Bi[m] = (o a) (8.58) 


B2 = ( a ) (8.59) 
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Fig. 8.3 A bifurcation diagram on the seasonal time scale for the two-season, two-stage model 
of Fig. 8.2. Total densities are plotted for Season 1 (e) and 2 (+). Parameters: a = 20, b = 1, 
sj = 0.5; Sa varied from 0 to 1 


and the nonlinear periodic model 


nı (t + 1) = Bom(t) 


n2(t) = By [mi (t)] n: Œ) (8.60) 


Figure 8.3 is a bifurcation diagram for the system (8.60) in response to changes in 
adult survival sa. When adults are long-lived (sq Z, 0.22) there is a 2-cycle on the 


~N 


seasonal scale, corresponding to a fixed point on the annual time scale, satisfying 
nh, = Bot (8.61) 
hy = Bı [ñ ]û; (8.62) 


At Sa œ% 0.22 this 2-cycle bifurcates to a 4-cycle on the seasonal time scale, 
corresponding to a 2-cycle on the annual scale. 

To derive the block matrices C and D in Eqs. (8.49) and (8.50), define the 
parameter vector as 


0 = (sj saab). (8.63) 
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The derivative matrices are 


dvec B, 


do" 


dvec B2 


do" 


dvec B, 


an" 


dvec B2 


dn 


0 


0000 
1000 
0000 


00 0 0 
00 0 0 
00 e"2 —añze 
00 0 0 


0100 


0 0 
0 0 
0 —abe—>Pr2 
0 0 


—bn2 


(dimension 4 x 2) 
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(8.64) 


(8.65) 


(8.66) 


(8.67) 


We calculate the sensitivities of the equilibrium population at each phase of the 
cycle using Eq. (8.46) with sg = 0.4 (a 2-cycle on the seasonal time scale; see 
Fig. 8.3) and with sg = 0.1 (a 4-cycle on the seasonal time scale). The results, and 
the sensitivities of several averages, are shown in Fig. 8.4. 

At the seasonal 2-cycle (annual fixed point), increases in Sj or Sa increase 
density in Season 1 and reduce density in Season 2, and have little effect on the 


Sa = 0.4 sa=0.1 
Avg Over Stages Avg Over Stages 2 Avg Over Stages and Seasons 
2 5 
Year 1, S i Mm Year 
1 E Scason2 Hii Year 1. Season 2 1 Year 2 
| E 25 [Wear 2, Season 1 
0 i [Year 2, Season 2 0 
4 | o ql O E 
a - 
-2 tro | 1 
-3 -2.5 29 
2 4 5 -3 
ke] = 
= Sj Sa a b Sj Sa a b Sj Sa a b 
3 05 Avg Over Stages and Seasons 2 Avg Over Stages and Years 1 Avg Over Stages, Seasons, and Years 
; S T 
0 1 l Mian] 0.5 
-0.5 0 S 0 
| -0.5 
=41 -1 -1 
18 =e -1.5 
-2 -3 = 
-2.5 -4 -2.5 


Fig. 8.4 Sensitivities of equilibrium total population size in Seasons 1 and 2, as well as the annual 
population average, to the demographic parameters $j, Sa, a, and b. Left: sensitivities when sq = 
0.4 (seasonal 2-cycle, annual equilibrium). Right: sensitivities when s; = 0.1 (seasonal 4-cycle, 


annual 2-cycle) 
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density averaged over seasons. The maximum fertility level a has little effect at 
either season, and the density-dependent parameter b has large negative effects 
throughout. 

At the 4-point seasonal cycle (2-cycle on the annual time scale), the patterns are 
more complicated. We describe them in terms of the kp = 4 seasons in the cycle. 
The maximum fertility a has little effect at any point. The survival probabilities sj 
and sq have effects that are opposite in sign: an increase in sj increases the density 
in seasons | and 4, and reduces it in seasons 2 and 3. An increase in sg has the 
opposite effect. Averaged over years, both Są and s; increase density in season 1 
and reduce it in season 2, thus increasing the amplitude of the oscillation. Averaged 
over seasons, Sa and sj have opposite effects. When averaged over stages, seasons, 
and years, the effects of Sa cancel each other out, and only sj and b have appreciable 
effects. 

Even in this simple example, it is clear that parameter changes can have effects 
that differ among seasons and years. A set of MATLAB scripts to carry out these 
calculations appears in an online supplement to Caswell and Shyu (2012). 


8.5  LTRE Decomposition Analysis 


The LTRE decomposition analysis introduced in Sects. 2.9 and 4.5 can be extended 
to obtain the contributions, to any given outcome, of differences in parameters at 
each phase of the cycle. 

Suppose that € is a m x 1 dependent variable (scalar or vector-valued), a 
function of a parameter vector 0 that takes on values 01,...,0, over the cycle. 


Use superscripts to denote two conditions,* which produce results & and &@): 


ats... 0P > a (8.68) 
oP, 0 > gO (8.69) 
To first order, the effect on & is 
D DaI (gD _ po 
0—6 “Dg — 0,) (8.70) 


The kth term in the summation in (8.70) is the total contribution, over all of the 
parameters in 0, of parameter differences in phase k of the cycle. 

Define Rg as am x p contribution matrix whose entries are the contributions of 
parameter 0; in phase k to the effects on outcome variable &;. Then 


_ 4 D _ pil) 
R; = 0; D (0; -0l ) (8.71) 


3The extension to more than two conditions is easy; see Caswell (2001). 
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where the derivative is evaluated at the average of 0 and 0°. These contributions 
are a decomposition of the approximate effect in (8.70), 


p 
ED -E0 ~ SRI, (8.72) 
k=1 


The contribution matrix (8.71) requires d /d07, that is, the derivative of & to 
the parameter at phase k of the cycle. In the linear model (8.2), this is given by 
the kth term in the summation in (8.11). In the case of the nonlinear model (8.36), 
the derivative is obtained from Eq. (8.46) by setting all blocks of D, except those 
corresponding to phase k, to zero. 


8.6 Discussion 


The distinguishing feature of periodic models is that the dynamics over a projection 
interval are given by a periodic product of matrices. The periodic product may reflect 
the existence of multiple timescales (e.g., seasonal and annual), or the operation of 
multiple processes (e.g., demography and harvest), or express conditional proba- 
bilities, or arise from classifying individuals by multiple criteria. The sensitivity 
analysis of periodic models must account for the chain of causation (Fig. 8.1) from 
demographic parameters at each phase in the cycle to the corresponding projection 
matrices, and thence to the periodic matrix product over the whole cycle, and finally 
to demographic outcome &. Matrix calculus makes this easy to do, starting with a 
simple chain rule expression (see Eq. (8.5)) and then using an appropriate version 
of (8.9) to calculate the derivative dvec A /d0". 
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Chapter 9 A 
LTRE Decomposition of the Stochastic cree | 
Growth Rate 


9.1 Introduction 


The basic unit of comparative demography is a study that reports the value of 
some demographic outcome in two populations that differ in a set of vital rates. 
One challenge of such studies is to account for the difference in outcomes by 
decomposing that difference into contributions from differences in each of the 
parameters. It frequently happens that small differences in some parameters make 
large contributions to the difference in outcomes, and vice-versa. 

In some parts of the literature, such studies are called life table response experi- 
ment (or LTRE) analyses; versions of this analysis have appeared in Sect. 1.3.1 and 
Chaps. 2, 4, and 8. The term was introduced by in the context of laboratory studies 
of the population effects of pollutants, hence the use of the word “experiment” 
(Caswell 1989). The conditions among which the populations are compared will be 
called “treatments” here, but there is no restriction to experimental manipulations. 

Similar decomposition analyses have been developed independently in ecology 
and human demography. For example, Pollard’s (1988) study of life expectancy used 
methods very similar to LTRE analyses of the population growth rate. Horiuchi et al. 
(2008) developed a method for continuous variables that is essentially identical to 
that used by ecologists for regression LTRE calculations (Caswell 1996). Canudas 
Romo (2003) reviews the human demographic literature. 

This chapter uses matrix calculus to extend LTRE analysis to stochastic models, 
by showing how to decompose differences in the stochastic growth rate, log às. 
Because stochastic models include both environmental fluctuations and the vital 
rate responses to those fluctuations, their structure is richer than that of time- 


Chapter 9 is modified from: Caswell, H. 2010. Life table response experiment analysis of the 
stochastic growth rate. Journal of Ecology 98:324—333. ©Hal Caswell. 
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invariant models. Stochastic LTRE analysis thus requires a new approach to 
decomposing these differences. The payoffs, in terms of demographic and biological 
understanding, are great. 


9.2 Decomposition with Derivatives 


The familiar LTRE analysis uses derivatives to approximate the contributions of the 
vital rates to some (vector-valued) outcome & (dimension q x 1), as described in 
Chap. 2. Suppose that £ depends on a vector 6 of vital rates (dimension p x 1), and 
that observations are available under two treatments, with 
gD) — V (9.1) 
92 — O, (9.2) 


Using matrix calculus notation, to first order, 


d 
£2 _g0 x a (09 0). (9.3) 


where the derivative of & is evaluated at the mean of the two parameter vectors. 
All the contributions to the difference £” — ge are contained in a matrix C 
(dimension q x p) given by 


_ 4 DaD 
d=” (0 ~6 ) (9.4) 


where the derivative is evaluated at the mean of 0“ and 0. 
The entry C(i, j) of the contribution matrix is the contribution of the difference 
A9; to the difference in Aé;. The columns and rows of C give 


C(:, j) = contribution of A@; to Ag (9.5) 
C(i, :) = contribution of A@ to Ag;. (9.6) 


The sum over rows of C is the approximation (9.3) to the treatment effect on & 
EO — ¿0 x C1). (9.7) 


The accuracy of this approximation gives a measure of the adequacy of the first- 
order assumption. Contributions can be small either because the treatment has little 
effect on 6; or because € does not respond much to changes in 0;. 

The contribution matrix C takes advantage of matrix calculus to provide a simple 
calculation for decomposition of scalar-, vector- or matrix-valued differences. 
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Studies including more than two treatments or conditions are analyzed by defining 
a reference parameter vector 0, and calculating a matrix C; for treatment i in terms 
of the parameter difference 0; — 0,. The reference treatment might be the average 
parameter set, or the parameters for a “control” condition, etc. 


9.3 Kitagawa and Keyfitz: Decomposition Without 
Derivatives 


In decomposing differences in the stochastic growth rate, we encounter variables 
for which the derivatives in (9.3) cannot be calculated. Fortunately, an alternative 
method for decomposition is available that does not rely on derivatives. It was 
introduced by Kitagawa (1955) to explore the effects of age-specific death rates 
and of age distribution on crude death rates. The method was later extended 
by Keyfitz to decompose differences in age distributions, dependency ratios, and 
population growth rates into contributions from the entire mortality and fertility 
schedules (Keyfitz 1968, Section 7.4; Keyfitz and Caswell 2005, Section 10.1). 
Canudas Romo (2003) summarizes more recent extensions of the approach in 
demography. 

Suppose that £ depends on two variables, with values (a, b) in Treatment 1 and 
(A, B) in Treatment 2. Thus 


§ = é[a, b] (9.8) 
¿® = é[A, B]. (9.9) 
To decompose the treatment effect €[A, B] — €[a, b] into contributions from A — a 
and B — b, the Kitagawa-Keyfitz method proceeds by exchanging variables between 
the two treatments and calculating € for all possible combinations. The effect of 
A — a, against the background of B, is €[A, B] — &[a, B]. The effect of A — a, 


against the background of b is €[A, b] — &[a, b]. The overall contribution of A — a 
is obtained by averaging its effect against the two backgrounds: 


C(A — a) = (E14, B] — £la, B1) 
+ 1/2 (lA, b] — éla, b1). (9.10) 
Similarly, the contribution of B — b is 
C(B — b) = 1/2 (SIA, B] — EIA, b1) 


+ 12 (Ela, B] — £la, b1). (9.11) 
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If this appears familiar, it may be because this process of averaging differences 
across different backgrounds is precisely analogous to the calculation of main 
effects in a two-way ANOVA (e.g., Steel and Torrie 1960, Section 11.2). 


9.4 Stochastic Population Growth 


A stochastic model contains two components: a model for the dynamics of the 
environment and a model for the response of the vital rates to the environment 
(Cohen 1979; Tuljapurkar 1990; Caswell 2001). I focus here on the stochastic 
population growth rate, log às. Consider a population growing according to 


n(t + 1) = A(t)n(t) (9.12) 


where the projection matrix A(t) is generated by a realization of an ergodic 
stochastic environment that produces, for every environmental state, a set of vital 
rates that satisfy certain regularity conditions. Then, the asymptotic long-term 
growth rate is, with probability one, 


1 
logàs = lim = log lar ahs -A(0)no| (9.13) 


(Cohen 1976; Tuljapurkar and Orzack 1980; Tuljapurkar 1990). This growth rate 
plays a central role in demography and biodemography in stochastic environments, 
exactly analogous to the role played by the population growth rate à or r = loga 
in stable population theory in constant environments. Cohen (1986) and Lee and 
Tuljapurkar (1994) have incorporated models of the form (9.12), with the addition 
of immigration terms, into the context of human population projections, to provide 
estimates of confidence intervals more rigorous than the “high, medium, low” 
scenarios usually reported. 

The additional component in stochastic environments adds an extra layer of 
complexity to the LTRE decomposition of the stochastic growth rate (Fig. 9.1). 
The differences in log às between two treatments is partly due to differences in 
the environmental dynamics and partly to differences in the vital rates within each 
environmental state. 

In this chapter, I consider the case in which the environment is described by a 
finite-state Markov chain. Ecological examples include years with our without fire 
(Silva et al. 1991), years since fire (Caswell and Kaye 2001), years with early or late 
floods, or with high or low precipitation (Smith et al. 2005) and years with good or 
poor sea ice conditions (Hunter et al. 2010; Jenouvrier et al. 2009b). The Markovian 
environment case also includes the situation where the environment is modelled 
implicitly by selecting randomly from a set of empirically-measured matrices (e.g., 
Bierzychudek 1982; Cohen et al. 1983; Jenouvrier et al. 2009a). Let u(t) be the 
state of the environment at time t. The environmental dynamics are determined by 
the Markov chain transition matrix P, where p;;j = P [u(t + 1) = iļu(t) = j]. 
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(a) 


Treatment i |———~ | Vital rates o” | ———— >| Growth rate à 


(b) Environmental 


ae dynamics P” a 
=< Vital rate response a 


© = {6), Ok} 


Treatment / 


Growth rate log A, 


Fig. 9.1 The determination of population growth rate in (a) time-invariant and (b) stochastic 
models. The deterministic growth rate À is defined by a set of vital rates, which are determined 
by the environment (“treatment”). The stochastic growth rate log A, requires an additional model 
for the stochastic dynamics of the environment and a function giving the response of the vital rates 
to the state of the environment 


The second part of the model is the response of the vital rates to the environment. 
Let 0 be a vector of parameters that determine the projection matrix A. The vectors 
0,,...,0x correspond to environmental states 1,..., K. I will write the entire set 
of vital rates as 


© = {01,..., 0x}. (9.14) 


We write A(t) = A[6(r)], and the stochastic growth rate (9.13) becomes 
1 
log, [P, @] = lim —log | ator — 1)]--- A[8(0)] no | (9.15) 
Too T 


where @(t) is the parameter vector created by the environmental state u(t). I have 
written log Às as an explicit function of P and © to emphasize that it depends on 
both the environment and the vital rate response. 


9.4.1 Environment-Specific Sensitivities 


The sensitivity of log A, to the vital rates was given by Tuljapurkar (1990). For the 
LTRE analysis, we require the derivatives of log A, with respect to the parameters 
in each state of the environment; i.e., to each of the vectors 0; in ©. These 
environment-specific sensitivities were given by Caswell (2005) and independently 
by Horvitz et al. (2005), and have been applied by Gervais et al. (2006), Aberg 
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et al. (2009), and Svensson et al. (2009). Rewriting Tuljapurkar’s (1990) formula in 
matrix calculus notation yields the derivative of log às with respect to the vital rate 
vector in environment 7: 


T-1 T T 
log As 1 t t+1 A 
dlog à, — lim ` , [WO Ev )"] dvec CAGI (9.16) 
do |u; T>oT RV t+Dwe+l = dv 


This is the stochastic analogue of the expression (3.46) in Chap. 3, for the sensitivity 
of the deterministic growth rate. The vectors w(t) and v(t) are the stochastic 
analogues of the right and left eigenvectors of a deterministic model, and R; is the 
growth of total population size from ż to t + 1. See Caswell (2001, Section 14.4) for 
a step-by-step algorithm for the calculation. 

To make sensitivity environment-dependent, J; is an indicator variable, defined 
as 


9.17 
0 otherwise ( ) 


jee i 
If the parameters @ consist of the elements of A, then dvec A/d6" = I, where I is 
the identity matrix. If ð contains lower-level parameters, then dvec A/d@" contains 
the derivatives of A with respect to these parameters. 


9.5 LTRE Decomposition Analysis for log às 


Suppose now that we have two treatments, and want to decompose the difference, 
log 42 — log A) = log As bag o0 | — log Às p”, o” | (9.18) 


into contributions. This difference compares growth in treatment 2 to growth 
in treatment 1. Treatment 1, the reference treatment, could be a control in a 
manipulative experiment, or some other specific condition of interest (as in the 
example to be considered below), or an average over treatments in a factorial 
experiment. 

The treatment effect on log às in (9.18) depends on both the differences in 
environmental dynamics (captured in the transition matrices P and P®) and 
the differences in the vital rate responses (captured in the parameter arrays oV 
and @)). Because log às is calculated numerically from (9.15) by simulation, it 
cannot be differentiated! with respect to P, so we will use the Kitagawa-Keyfitz 


'Well, not by me. But see Steinsaltz et al. (2011) for a rigorous development of the sensitivity 
analysis of stochastic growth rates that includes the effects of changes in the entries of P. 
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decomposition for the environmental dynamics contribution, and environment- 
specific derivatives (9.16) for the vital rate response contributions 

Let us consider three cases: the case where only the vital rate responses differ, 
the case where only the environmental dynamics differ, and finally the case where 
both differ. 


9.5.1 Case 1: Vital Rates Differ, Environments Identical 


Consider two treatments that affect the vital rate responses but not the environmental 
dynamics. For example, one might want to compare low and high fertility sites 
subjected to a common fire frequency. The transition matrix P is identical in the 
two sites, but the vital rates differ. The stochastic growth rates are 


log AM) = log a, |P, o% | (9.19) 
log 42) = log As [P. o| l (9.20) 


The difference in log As is composed of contributions from vital rate differences in 
each state of the environment. To first order, 


(2) (1) K /3 log Às 
loga;” — logay’ © a 50" 
i=] 


o) (02 = o) (9.21) 


where the derivatives are environment-specific sensitivities (9.16), and are evaluated 
at the mean of © and ©). The ith term of the summation in (9.21) is the 
contribution of differences in the ith environment. These can be written as the 
elements of a contribution matrix (dimension | x p) 


ð log As 
06" 


C(6;) = 


D (0P 0P) SS aig. (9.22) 
u=i 


9.5.2 Case 2: Vital Rates Identical, Environments Differ 


Now consider two treatments that affect the environmental dynamics (given by 
P“) and P®) but not the vital rate responses. For example, a comparison of 
population growth before and after implementing a fire control strategy that changes 
the frequency of fire, but has no effect on how the vital rates respond to fire. The 
stochastic growth rates are 


log AM) = log As [Po o| (9.23) 


log A2) = log As Pe, o| , (9.24) 
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The matrices P and P® may differ in their long-term frequencies of environ- 
mental states. Those long-term frequencies are given by the stationary distributions, 
i.e., the right eigenvector x corresponding to the dominant eigenvalue of P (which 
always equals 1), scaled so that x sums to 1. The same frequency of environmental 
states, however, can be obtained from processes with different autocorrelation 
patterns, from negative autocorrelation (where states tend to alternate) to positive 
autocorrelation (characterized by long runs of the same state; see Caswell and Kaye 
(2001, Fig. 2) for an example). So, P and P® may differ in their stationary 
distributions, autocorrelation patterns, or both. To separate the contributions from 
these, using the Kitagawa-Keyfitz decomposition, we construct a Markov chain with 
the same stationary distribution x as P, but in which successive environmental states 
are independent, and hence there is no autocorrelation. This chain has the transition 
matrix 


Q=al' (9.25) 
where 1 is a vector of ones. Because the next state is independent of the previous 
state, and the same matrix is applied at each time, this process is called “independent 


and identically distributed,’ and abbreviated “‘iid.” 
The contribution to log 12) — log iD of differences in P is 


CP) = logas [P®, @] — log’, [P®, @]. (9.26) 
The contribution of the difference in the iid part of the environment is 
CQ) = logàs [Q®, @] - 10ga, [Q”, ©]. (9.27) 


The contribution of differences in environmental autocorrelation, denoted by C(R), 
is obtained by subtraction; 


C(R) = C(P) — C(Q). (9.28) 


9.5.3 Case 3: Vital Rates and Environments Differ 


Finally, consider two treatments that differ in both the environmental dynamics (P“ 
and P®) and the vital rate responses (0% and o2). The stochastic growth rates 
are 


log A) = log As ka o] (9.29) 


log A?) = log Ag la o® | . (9.30) 
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Our goal is to decompose log LP — log aD into contributions from the differences 
in the stationary environmental frequencies (C (Q)), in the autocorrelation pattern 
(C(R)), and in the vital rates in each environmental state (C (01), ..., C(0x)). The 
decomposition analysis proceeds in three steps. 


1. Write the contributions of the environmental differences using the Kitagawa- 
Keyfitz method 


1 

C(P) = ; (108 Às pa, o| — log Às [PO o| 
+ log Ag jp, o| — log Às pe 2) (9.31) 
1 

C(Q) = 7 (102 iis [o”. o| — log às [o”. o| 


+ log hs [o”. o| — log hs [o”, 2) (9.32) 


C(R) = C(P) — C(Q) (9.33) 
Each of C(P), C(Q), andC(R) is a scalar. 


2. Write the contributions of the vital rate differences using the Kitagawa-Keyfitz 
method 


1 
C(®) = a | log Às bags o| — log Às [ee 3] 


1 
+5 | log às ie, o| — log Às Pe, o]| (9.34) 


C(@) is a scalar, summing the effects of differences in all of the parameter 
responses at all states of the environment. It is decomposed further in the next 
step: 

3. Use the environment-specific derivatives of logA, to decompose each term 
in (9.34) into contributions from the vital rates in each environment, using (9.22) 


, (= oR] ) p (02-00) 
t I 


CO)=5 a0" 
)> 02-0) i=1,...,K 


(9.35) 


u=i 


1 / ðlogàs [P®, O] 
2 a6" 


u=i 


188 9 LTRE Decomposition of the Stochastic Growth Rate 


fori = 1,..., K, with the derivatives evaluated at O, the mean of the vital rates 
in the two treatments being compared. The matrix C(0;) is (1 x p) vector, whose 
entries give the contributions to the differences in log Às from each of the vital 
rates in environment i. 

The total contribution of the parameter differences given in (9.34) is 


K 
C(@) =J C0) 1). (9.36) 
i=l 


These calculations are easily implemented by writing subroutines to calculate 
logAs and the environment-specific sensitivities given a transition matrix and a 
set of parameters. The accuracy of the approximations involved can be checked 
by comparing 


K 
log a? — log? © C(Q) + C(R) + > C@)) Ip. (9.37) 


i=l 


9.6 An Example: Fire and an Endangered Plant 


I know of no comparative studies of stochastic population growth that include 
differences in both the environmental dynamics and the vital rate responses, so here 
is an artificial example, based on a model for an endangered plant, Lomatium brad- 
shawii, in a stochastic fire environment (Caswell and Kaye 2001). L. bradshawii 
(Apiaceae) is a polycarpic herbaceous perennial plant. It exists in only a few isolated 
populations in prairies of Oregon and Washington. These habitats were, until recent 
times, subject to natural and anthropogenic fires, to which L. bradshawii seems 
to have adapted. Fires increase plant size and seedling recruitment, but the effect 
fades within a few years. Populations in recently burned areas have higher growth 
rates and lower probabilities of extinction than unburned populations. For more 
information, see Pendergrass et al. (1999), Caswell and Kaye (2001), and Kaye 
et al. (2001). 

A stochastic demographic model for L. bradshawii was developed by Caswell 
and Kaye (2001), based on data from an experimental burning study. Individuals 
were classified into six stages based on size and reproductive status: yearlings, small 
and large vegetative plants, and small, medium, and large reproductive plants. The 
environment was classified into four states defined by the time since the most recent 
fire: the year of a fire and 1, 2, and 3+ years post-fire, and vital rates were estimated 
in each of these environmental states. The matrices are given in Caswell and Kaye 
(2001). 

Populations were studied in two sites: Fisher Butte (FB) and Rose Prairie (RP) 
in western Oregon. The two sites differed in quality for L. bradshawii, with RP 
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Table 9.1 The population 
growth rate A calculated from 


Fisher Butte | Rose Prairie 


: ; Years post-fire | À À 
the environment-specific - - 
matrices A[6;] for L. 0 1.020 1.155 
bradshawii. (From Caswell 1 0.984 1.118 
and Kaye 2001) 2 0.662 0.483 
>3 0.869 0.906 


superior to FB. Population growth rates were generally higher at RP than at FP 
(Table 9.1), and the stochastic growth rate was higher in RP than FB at any fire 
frequency. The critical fire frequency required to maintain L. bradshawii populations 
was about 0.8-0.9 at FB, but only 0.4—0.5 at RP. The causes of the differences 
between the sites are not known (Pendergrass et al. 1999). 


9.6.1 The Stochastic Fire Environment 


The model for environmental dynamics is a two-state Markov chain for fires (each 
year is either fire or no fire). This generates a four-state Markov chain for the 
environmental states (0, 1, 2, and 3 or more years post-fire). Let f be the long-term 
frequency of fire, and p the temporal autocorrelation coefficient of the fire process 
(the magnitude of p determines the rate of decay of correlation as time increases, 
the sign of po determines whether the correlation is of one sign, or oscillates). In 
the two-state fire model, the probability of fire in year t + 1 if there was no fire 
in year t is q = f(1 — p). The probability of a fire if there was a fire in year t is 
p = q + p (see Caswell 2001, Section 14.1). The resulting transition matrix for the 
four environmental states is 


P 9.38 
0 l-q4q 0 0 eae 
0 0 l-ql-gq 
If o < 0, f must satisfy 
rooe (9.39) 
l-p l-p 


in order to keep probabilities bounded between 0 and 1. See Caswell and Kaye 
(2001). Note that even if the fire process is iid, so that ọ = 0, the environmental 
process given by (9.38) is not iid. 
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9.6.2 LTRE Analysis 


There is no information on differences in fire dynamics at the two sites, so Caswell 
and Kaye (2001) studied the response of log Às to the frequency and autocorrelation 
of fires. Here, we use stochastic LTRE analysis to decompose the differences in 
log às in three hypothetical scenarios of environmental differences. I will use the 
matrix entries as the vital rates 0, there being no natural lower-level parameterization 
in this model. MATLAB code for the calculations is available as an appendix to 
Caswell (2010). 

The stochastic growth rate log Às increases with fire frequency for both species. 
The RP site has a growth advantage, with log AE > log AUB) at all fire 
frequencies. The RP advantage, measured by log a) — log ARB) increases from 
x0.02 when f = 0 to ~0.13 when f = 1. 


Differences in vital rates and environmental transitions (Case 3) Suppose that 
the two sites differ in both environmental dynamics and vital rate responses, with 
fire frequencies, autocorrelations, and resulting stochastic growth of 


FB RP 
fl 05 07 
p —0.5 0.5 


log às /—0.043 0.081 


In this hypothetical scenario, the FB population tends to experience alternating years 
with and without fires; in RP, there is a tendency for long runs of years with and 
without fires. For additional scenarios, see Caswell (2010). 

To decompose the treatment effect log ae) — log EB we construct the 
Markov chain transition matrices from (9.38), and calculate the stationary distri- 
butions 7%?) and m®? as eigenvectors of P. For each site, we generate the iid 
transition matrix Q from (9.25), and compute the contributions C(P) from (9.31), 
C(Q) from (9.32), and C(R) from (9.33). Then we compute the environment- 
specific sensitivities of log A; from (9.16), for both PP) and PP), and use these 
to calculate the contributions C(0;) of the vital rates in each environmental state, 
using (9.35). Finally, we sum the C(@;) to obtain the integrated effect of all vital 
rate differences in each environment. 

Figure 9.2 shows these contributions. Most of the growth rate advantage of the 
RP site can be attributed to an RP advantage in A[6;] and A[02] (the year of a 
fire and the year immediately following a fire). The difference in the long-term 
frequency of environmental states, and the differences in autocorrelation patterns, 
make relatively little contribution. 
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0.1 


Contribution 
E 
o 
D 


i [4 


-0.02 


Q R A1 A2 A3 A4 

Fig. 9.2 The contributions of the iid component of the environment (Q), the autocorrelated 
component of the environment (R), and the projection matrix entries in each environmental state 
(Aj, ...A4) to the difference in the stochastic growth rate log Às between the Rose Prairie (RP) 


and Fisher Butte (FB) populations of Lomatium bradshawii. Calculations assume fire frequencies 
of 0.5 for FB and 0.7 for RP, and autocorrelations ọ = —0.5 for FB and p = 0.5 for RP 


The accuracy of the approximations involved in the LTRE analysis is good. The 
sum of the contributions in Fig. 9.2 is 0.1192, while the actual difference in log A, 
is 0.1219 (an accuracy of 98%). 

Alternatively, suppose that some kind of fire prevention program in the RP 
site reduced the fire frequency to f = 0.1 (well below the critical threshold for 
persistence), but a fire management program increased the fire frequency in the FP 
site to f = 0.9. 


| FB RP 
f 09 Ol 
p| 00 0.0 


log As (0.027 —0.113 
Now log AB) > log LED, despite the general advantage in vital rates of RP over 
FB in most environmental states. Figure 9.3 presents the contributions to A log Às 
from differences in fire frequency, autocorrelation, and vital rates, and shows how 
the contributions of the vital rate differences are, in this case, overwhelmed by the 
RP disadvantage due to the stationary distribution of the environment. 
The sum of the contributions in Fig. 9.3 is —0.1326, while the actual difference in 
log às is —0.1395 (an accuracy of 95%, even with a very large difference in growth 
rate). 
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o = 


Contribution 


-0.25 i : i i : i 
Q R A_1 A_2 A3 A_4 

Fig. 9.3 The contributions of the iid component of the environment (Q), the autocorrelated 
component of the environment (R), and the matrix entries in each environmental state (Ay, . . . A4) 
to the difference in the stochastic growth rate log A, between the Rose Prairie (RP) and Fisher Butte 


(FB) populations of Lomatium bradshawii. Calculations assume fire frequencies of 0.9 for FB and 
0.1 for RP, and autocorrelation p = 0 for both populations 


9.7 Discussion 


This application of matrix calculus provides a general framework for decompo- 
sition analysis of the stochastic growth rate in Markovian environments. It is 
a direct generalization of the familiar LTRE approaches for time-invariant and 
periodic models, but combined with the powerful Kitagawa-Keyfitz decomposition. 
Comparative studies of the stochastic growth rate require additional data on the 
stochastic dynamics of the environment, beyond that needed for time-invariant 
models (Fig. 9.1). Many stochastic studies present conditional results; for example, 
the study of L. bradshawii provides log Às as a function of f, p, and ©, but does 
not estimate the value of log A; actually exhibited in either of the two sites. To do so 
would require long-term data on the stochastic environment, which is hard to come 
by. However, such information may possibly be extracted from historical data (e.g., 
Smith et al. 2005; Lawler et al. 2009), or projected from climate models (Hunter 
et al. 2010; Jenouvrier et al. 2009b). 

The methods presented here are not limited to Markovian environments in which 
the environmental states have an interpretation (years since fire, flood conditions, 
etc.). They can also be used when matrices are randomly selected from a series 
collected over time (e.g., the early study of Bierzychudek (1982) based on two 
yearly matrices, or the study by Jenouvrier et al. (2009b) based on 44 years of 
matrices for emperor penguins). Although such models are indeed Markov chains, 
if years are simply a random sample of environmental variation, then it is of little 
interest to know the contribution of vital rate differences in, say, 1988 compared to 
1989 or 1987. In these models, the mean and variance of the vital rates may be of 
more interest. Davison et al. (2010), drawing on the stochastic elasticity results of 
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Tuljapurkar et al. (2003), have presented an approach to LTRE analysis in terms of 
the contributions of differences in the mean and the variance of the vital rates. That 
method nicely complements the approach presented here. 

In the analysis of Lomatium bradshawii, even large differences in environmental 
autocorrelation made small contributions to treatment effects on log às. This is not 
surprising, given the generally small impact of changes in autocorrelation on the 
stochastic growth rate in this model (Caswell and Kaye 2001). It is, however, not 
guaranteed. Given the proper interaction between environmental states and the stage 
structure, autocorrelation can have dramatic impacts on the growth rate (Caswell 
2001, Example 14.1). How often this happens in nature will only be revealed by 
further comparative studies. 

Changing focus from plants in a fluctuating fire environment to human popula- 
tions projected in response to stochastic fluctuations in mortality and fertility (e.g., 
Tuljapurkar 1992; Lee and Tuljapurkar 1994), there are possibilities for applying 
this approach to population projections. However, such attempts will be challenging 
because the stochastic environments are not stationary, and the interest is not in 
asymptotic stochastic growth, but in short term transient dynamics. A combination 
of the transient analyses in Chap.7 with the decomposition approach here might 
yield interesting results. 
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Part IV 
Nonlinear Models 


Chapter 10 A 
Sensitivity Analysis of Nonlinear TRICA 
Demographic Models 


10.1 Introduction 


Nonlinearities in demographic models arise due to density dependence, frequency 
dependence (in 2-sex models), feedback through the environment or the economy, 
recruitment subsidy due to immigration, and from the scaling inherent in calcula- 
tions of proportional population structure. This chapter presents a series of analyses 
particular to nonlinear models: the sensitivity and elasticity of equilibria, cycles, 
ratios (e.g., dependency ratios), age averages and variances, temporal averages and 
variances, life expectancies, and population growth rates, for both age-classified and 
stage-classified models. 

Nonlinearity is defined in contrast to linearity. If x is an age or stage distribution 
vector, and if the dynamics of x are given by 


xt +1) = f[x@], (10.1) 
then the model is linear if f(-) is a linear function, i.e., if 


f (axı + bx2) = af (x1) + bf (x2) (10.2) 


for any constants a and b and any vectors x; and x2. 

If a model is not linear, it is nonlinear. Not surprisingly, this covers a lot of 
territory, but nonlinearity in demographic models can be classified into four main 
sources: density dependence, environmental feedback, interactions between the 
sexes, and models that arise in calculation of proportional structure. 


Chapter 10 is modified, under the terms of a Creative Commons Attribution License, from: 
Caswell, H. 2008. Perturbation analysis of nonlinear matrix population models. Demographic 
Research 18:59—116. ©Hal Caswell. 
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Density dependence: arises when one or more of the per-capita vital rates 
are functions of the numbers or density of the population. Such effects have 
been incorporated into demographic studies of plants (e.g., Solbrig et al. 
1988; Gillman et al. 1993; Silva Matos et al. 1999; Pardini et al. 2009; 
Shyu et al. 2013) and animals (e.g., Pennycuick 1969; Clutton-Brock et al. 
1997; Cushing et al. 2003; Bonenfant et al. 2009). Density dependence has 
been intensively studied in the laboratory (e.g., Pearl et al. 1927; Frank 
et al. 1957; Costantino and Desharnais 1991; Carey et al. 1995; Mueller and 
Joshi 2000; Cushing et al. 2003). It can arise from competition for food, 
space, or other resources, or from interactions (e.g., cannibalism) among 
individuals. 

Simple density dependence is less often invoked by human demogra- 
phers!. Weiss and Smouse (1976) proposed a density-dependent matrix model, 
and Wood and Smouse (1982) applied it to the Gainj people of Papua New 
Guinea. Density dependence is included in epidemiological feedback models 
applied to a rural English population in the sixteenth and seventeenth centuries 
by Scott and Duncan (1998). 

The Easterlin effect (1961) produces density dependence in which fertility is a 
function of cohort size. Analysis of the Easterlin effect has focused mostly on the 
possibility that it could generate cycles in births (e.g., Lee 1974, 1976; Frauenthal 
and Swick 1983; Wachter and Lee 1989; Chu 1998). 

Environmental (or economic) feedback. Density-dependent models are often 
an attempt to sneak in, by the back door as it were, a feedback through 
the environment. A change in population size changes some aspect of the 
environment, which affects the vital rates, which in turn affect future population 
size. Models in which the feedback operates through resource consumption are 
the basis for the food chain and food web models that underlie models of global 
biogeochemistry (e.g.,. Hsu et al. 1977; Tilman 1982; Murdoch et al. 2003; 
Fennel and Neumann 2004). These models are typically unstructured, but there is 
a rich literature on structured models, written as partial differential equations, to 
incorporate physiological structure and resource feedback (de Roos and Persson 
2013). 

Feedback models are also invoked in human demography, with the feedback 
operating through the economy (Lee 1986, 1987; Chu 1998). An interesting 
aspect of these approaches is the possibility that, if larger populations support 
more robust economies, the feedback could be positive instead of negative (Lee 
1986; Cohen 1995, Appendix 6). An exciting combination of ecological and 


'Lee (1987) reviewed the situation and said “. .. we might say that human demography is all about 
Leslie matrices and the determinants of unconstrained growth in linear models, whereas animal 
population studies are all about Malthusian equilibrium through density dependence in nonlinear 
models ...”. He admits that this is an exaggeration, and there clearly are nonlinear concerns in 
human demography (Bonneuil 1994), but a non-exhaustive survey finds no mention of density 
dependence in several contemporary human demography texts (e.g., Hinde 1998; Preston et al. 
2001; Keyfitz and Caswell 2005). 
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economic feedback appears in the food ratio model recently proposed by Lee 
and Tuljapurkar (2008). 

Two-sex models. To the extent that both males and females are required for 
reproduction (and, in the bigger scheme of things, this is not always so), 
demography is nonlinear because the marriage function or mating function 
cannot satisfy (10.2). Nonlinear two-sex models have a long tradition in human 
demography (see reviews in Keyfitz 1972; Pollard 1977) and have been applied 
in ecology (e.g., Lindström and Kokko 1998; Legendre et al. 1999; Kokko and 
Rankin 2006; Lenz et al. 2007; Jenouvrier et al. 2010, 2012). Their mathematical 
properties have been investigated by e.g, Caswell and Weeks (1986), Chung 
(1994) and Iannelli et al. (2005) and in a very abstract setting by Nussbaum 
(1988, 1989). 

In their most basic form, two-sex models differ from density-dependent 
models in that the vital rates depend only on the relative, not the absolute, 
abundances of stages in the population (they are sometimes called frequency- 
dependent for this reason). This has important implications for their dynamics. 

Models for proportional population structure. Even when the dynamics of 
abundance are linear, the dynamics of proportional population structure are 
nonlinear (e.g., Tuljapurkar 1997). This leads to some useful results on the 
sensitivity of the stable age or stage distribution and the reproductive value. 


Linear models lead to exponential growth and convergence to a stable structure. 
Much of their analysis focuses on the population growth rate A or r = loga. 
Nonlinear models do not usually lead to exponential growth (frequency-dependent 
two-sex models are an exception). Instead, their trajectories converge to an attractor. 
The attractor may be an equilibrium point, a cycle, an invariant loop (yielding 
quasiperiodic dynamics), or a strange attractor (yielding chaotic dynamics); see 
Cushing (1998) or Caswell (2001, Chapter 16) for a detailed discussion. 

This chapter analyzes the sensitivity and elasticity of equilibria and cycles. 
Because the dynamic models considered here are discrete, solutions always exist 
and are unique. The nature and the number of the attractors depends on the specific 
model. Perturbation analysis always considers perturbations of something, so the 
equilibria or cycles must be found before their perturbation properties can be 
analyzed. 


10.2 Density-Dependent Models 


We begin with the basic discrete-time? density-dependent model, written as 


n(t + 1) = A[0, n(t)] n(t) (10.3) 


7It is possible to generalize to continuous-time models, that would be written 


dn Age t t 
u [0, n(t)] nC) 
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where n(ż) is a population vector of dimension s x 1 and A is a population projection 
matrix of dimension s x s. The matrix A depends on a p x 1 vector 0 of parameters 
as well as on the current population vector n(t).° 


10.2.1 Linearizations Around Equilibria 


An equilibrium of (10.3) satisfies 


=> 


i=A[0, a] (10.4) 


Such an equilibrium may be stable (small perturbations from n eventually return 
to the equilibrium) or unstable.* That stability is determine by the linearization of 
the nonlinear system (10.3) near x. That is, define the deviation from X as z(t) = 
x(t) — X. Then z(t) follows 


z(t + 1)M[@, X]z(t) (10.5) 


The matrix M is the Jacobian matrix, 


= a n (10.6) 
To obtain M, differentiate both sides of (10.3), 
dx(t + 1) = (dA) x + A (dx) (10.7) 
Applying the vec operator to both sides gives 
dx(t + 1) = (x @ 9) dvec A + Adx (10.8) 
from which 
M= (x 2 L) ac +A (10.9) 


for some appropriately defined matrix function A; see Verdy and Caswell (2008). Such models are 
less often used, but see Shyu and Caswell (2016a, 2018) for a two-sex model example. 


3The explicit dependence on @ and n(t) will be neglected when it is obvious from the context. 


4A careful consideration of stability requires more care with the definition of these terms, but will 
not concern us here. See Caswell (2001) and Cushing (1998) for more details. 
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where I, is an identity matrix of order s. The linearization at the equilibrium is 
obtained by evaluating M at x = x: 


dvec A [9, x] 


ar TA [0,2] (10.10) 


If all the eigenvalues of M are less than one in magnitude, the equilibrium xX is 
locally asymptotically stable. The linearization also provides valuable information 
about short-term transient responses to perturbation; see Sect. 10.2.4. 


10.2.2 Sensitivity of Equilibrium 


Our goal is to find the derivatives of all the entries of ñ with respect to all of the 
parameters in 9; these are the entries of the s x p matrix 


dn 
an 
We begin by taking the differential of both sides of (10.4): 
dû = (dA)û + A(dn). (10.11) 
Rewrite this as 
dû =1,(dA)n+ A (dû), (10.12) 
where I, is an identity matrix of dimension s. Next apply the vec operator to both 


sides, remembering that since n is a column vector, vec ñ = hi, and apply Roth’s 
theorem, to obtain 


di = (âT @ I,) dvec A + Adi. (10.13) 


However, A is a function of both 0 and n, so 


A A 
ied dvec do + avec 


apt ant dn. (10.14) 


Substituting (10.14) into (10.13) and applying the chain rule leads toñ 


dn x dvecA  dvecA dn dn 
ñ ( T 1) ( vec vec =) n (10.15) 


Smee + =, 
do" aot on’ do! do" 


5It is reassuring to check that the dimensions of all these quantities are compatible: 
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Finally, solve (10.15) for da/do" to obtain 


dn aT dvec A\ T! aT dvec A 
a= (I A (a @1,) a) (a S1) oT (10.16) 


where A, ðvec A/06", and dvec A/df! are evaluated at fi. 
Comparing (10.16) and Eq. (10.10) for the linearization, we see that the sensitiv- 
ity of equilibrium can be written 


dn _ 1 [at dvec A 
oat = is -M (a S1) aT (10.17) 


The matrix (I; — M) is singular if 1 is an eigenvalue of M; i.e., at a bifurcation 
point when the equilibrium ñ becomes unstable. At that point, quite appropriately, 
the sensitivity is not defined because the change in the equilibrium is not continuous. 

The following example, applying (10.16) to a simple model, shows the basic 
steps and output of the analysis. 


Example 1: A simple two-stage model The most basic distinction in the life cycle 
of many organisms is between non-reproducing juveniles and reproducing adults. 
A model based on these stages (Neubert and Caswell 2000) is parameterized by 
the juvenile survival o4, the adult survival o2, the growth or maturation probability 
y (the expected time to maturity is 1/y), and the adult fertility f. The projection 
matrix is 


A= (ee =v) f ). (10.18) 
oiy 02 

Any of the vital rates could be density-dependent; here we suppose that juvenile 

survival ø; depends on total density: 


cı (n) = õ exp(—1'n): (10.19) 


where 1 is a vector of ones. T 
Define the parameter vector as 0 = ( fyo o2) . To apply (10.16) requires the 
derivatives of A[0, n] with respect to 0 and with respect to n. These are 


dn (aT 1 dvec A dvecA On a dn 
Tel :) T "ant aol | ~ aot 
aon a B m oe SxS ae, 


a 
sxp SXS* s2xp s?xs SXP SXp 
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dvec A 01 (10.20) 
= vec ; 
df 00 
dvec A —o;(n) 0 
= vec (10.21) 
dy oim) 0 
dvec A (1 — y)exp(~1"n) 0 
—— = vec (10.22) 
do y exp(—1'n) 0 
dvecA 00 
= vec (10.23) 
doz 01 
dvecA dvecA —o\(n)\(1— y) 0 
= = vec ; (10.24) 
i -omy 0 
The derivative of A with respect to the 0 is the 4 x 4 matrix 
0 —o1(n) (1 — y) exp(—1"n) 0 
dvec A 0 (n) y exp(—1'n) 0 
= , 10.25 
ae! 1 0 0 0 ( ) 
0 0 0 1 


where each column corresponds to an entry of @ and each row to an element of 
vec A. The derivative of A with respect to n is 


—o(n)(1— y) —oi(m)(1 — y) 


dvecA —o\(n)y —o\(n)y 
. 10.2 
-= ; (10.26) 
0 0 


Each column corresponds to an entry of n and each row to an element of vec A. 
Using some arbitrary parameter values (not unreasonable for humans or other 
large mammals) 


f =0.25 
o = 0.98 


o = 0.95 
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leads to an equilibrium population 


. (0.1053 
= 10.27 
’ Gea oe) 


obtained by iterating the model to convergence. 

These patterns reflect the life history, although comparative study of this 
dependence has scarcely begun. For example, if the demographic parameters were 
more appropriate for an insect, say with high fertility (f = 70), rapid maturation 
(y = 0.9), and low juvenile survival (6 = 0.1), and in which most adults die after 
reproducing once (o2 = 0.01), then the equilibrium would become 


. (1826 
= 10.2 
a Co) Ce) 


with sensitivities 


(10.29) 


do?  \ —0.0002 0.02 0.14 0.01 


dn ( 0.01 1.08 9.86 eae 
In this life history, increases in fertility have very small effects on the equilibrium 
population, and the effect of increased fertility on adult density is slightly negative. 
Changes in the maturation rate or in juvenile or adult survival have much larger 
impacts on juvenile density than on adult density. | 


10.2.3 Dependent Variables: Beyond ni 


The equilibrium vector ñ is usually not the only dependent variable of interest. If 
we write m = m(n) for any vector- or scalar-valued transformation of n, then the 
sensitivity of m is just 
dm dm dn 
— = 10.30 
do? dm! ae! CERN 
The possibilities for dependent variables are, roughly speaking, limited only by 
one’s imagination. The following is a list of examples. 


1. Weighted population density. Let c > 0 be a vector of weights. Weighted 
population density is then N(t) = e!n(t). Examples include total density 
(c = 1), the density of a subset of stages (ci = 1 for stages to be counted; 
cj = 0 otherwise), biomass (c; is the biomass of stage i), basal area, metabolic 
rate, etc. The sensitivity of Nis 
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dÑ da 

m =e. (10.31) 
do’ do" 

2. Ratios, measuring the relative abundances of different stages. Let 
T 

a n(t) 

R(t) = 10.32 

a= a (10.32) 


where a > 0 and b > 0 are weight vectors. Examples include the dependency 
ratio (in human populations, the ratio of the individuals below 15 or above 65 
to those between 15 and 65; see Sect. 10.5.3), the sex ratio, and the ratio of 
juveniles to adults, which is used in wildlife management; see Skalski et al. 
(2005). Differentiating (10.32) gives 


dR b'ña' — a'ñb' \ dû 
( ña a n n (10.33) 


dot (bT A)” do" 


3. Age or stage averages. These include quantities such as the mean age or size in 
the stable population or at equilibrium and the mean age at reproduction in the 
stable population. Their perturbation analysis is presented in Sect. 10.5.4. 

4. Properties of cycles. Nonlinear models may produce population cycles. Attention 
may focus on the mean, the variance, or higher moments of the population vector 
or of some scalar measure of density, over such cycles. The sensitivity of these 
moments is explored in Sect. 10.7. 


10.2.44 Reactivity and Transient Dynamics 


The asymptotic stability of an equilibrium is determined by the eigenvalues of the 
Jacobian matrix M in (10.9), evaluated at that equilibrium. In the short term, how- 
ever, perturbations of the population away from the equilibrium can exhibit transient 
dynamics that differ from their asymptotic behavior. In particular, perturbations of 
stable equilibria, that are destined to eventually return to the equilibrium, may move 
(much) farther away before that return. Neubert and Caswell (1997) introduced 
three indices, each calculated from M, to quantify these transient responses.° 
The reactivity of an asymptotically stable equilibrium is the maximum, over all 
perturbations, of the rate at which the trajectory departs from the equilibrium. At any 
time following a perturbation, there is a maximum (over all perturbations) deviation 


Because these indices are calculated from M, they are properly considered properties of the 
system and its dynamics. Stott et al. (2011) and Stott (2016) have also considered indices of 
transient response that reflect the particular initial condition rather than the inherent dynamics 
of the system. 
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from the equilibrium. This maximum is the amplification envelope. It gives an upper 
bound on the extent of transient amplification as a function of time. The phrase “over 
all perturbations” in these definitions signals that the transient amplification depends 
on the direction of the perturbation. The perturbation that produces the maximum 
amplification at any specified time is the optimal perturbation (Verdy and Caswell 
2008).’ 

The transient dynamics of the perturbed system are described by the evolution of 
the magnitude of z, as measured by the Euclidean norm ||z|| = Vz". The reactivity 
is the maximum, over all perturbations, of the growth rate of ||z||, as £ — 0, and is 
given by 


vo = 


2, [H(M)] continuous time 
| aaa (10.34) 


log o; (M) discrete time 


The matrix H(M) = (M + M!) /2 is the Hermitian part of M and à; denotes 

the eigenvalue with largest real part (Neubert and Caswell 1997). In discrete time, 

reactivity is the log of the largest singular value of M, which we denote o1 (M). 
The amplification envelope is 


o1 (eu ) continuous 
o0) = (10.35) 


o1 (M' ) discrete 


The optimal perturbation, normalized to length 1, is given by the right singular 
vector corresponding to the singular value that defines p(t). 

Verdy and Caswell (2008) presented a complete sensitivity analysis of reactivity, 
the amplification envelope, and the optimal perturbation, in both continuous and 
discrete time. Suppose the € be one of the indices, and suppose that the model 
depends on a parameter vector 0. Changes in 0 will change the equilibrium vector, 
which will contribute to changes in the Jacobian matrix, so that the sensitivity of & 
to 0 is 


dE ( dE iC dvec M aT) (10.36) 


do. dvec 'M ae! an! qo! 


The sensitivity of € in (10.36) requires four pieces: the linearization M at the 
equilibrium, which is given by (10.10), the sensitivity of the equilibrium ñ to the 
parameters, which is given by (10.16), the sensitivity of the Jacobian matrix M to 
the parameters, and the sensitivity of the index € to the matrix M. The sensitivity 


TIt is now known that reactivity is a common property of many ecological systems, including 
populations described by discrete matrix population models (Neubert and Caswell 1997; Chen 
and Cohen 2001; Neubert et al. 2004; Marvier et al. 2004; Caswell and Neubert 2005; Verdy and 
Caswell 2008). 
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of £ to M depends on which index, but the calculations involve perturbations of 
eigenvalues, singular values, or the matrix exponential, and are given in Verdy and 
Caswell (2008). The derivative of the linearization M is obtained by differentiating 
all the terms in Eq. (10.10); the result, along with several examples, is given in Verdy 
and Caswell (2008, eq. (37)). 


10.2.5 Elasticity Analysis 


The derivatives in the matrix dn/ do" give the results of small additive perturbations 

of the parameters. It is often useful to study the elasticities, which give the 

proportional result of small proportional perturbations, 

a y (a)! Â Do) (10.37) 
do" l 

The elasticity of any other (scalar- or vector-valued) dependent variable f (ñ) is 

given by 


-1 df (n) 
do" 


D (0). (10.38) 


L b =D (f) 


As usual, elasticities can only be calculated when 0 > 0 and f @) > 0. 


Example 2: Metabolic population size in Tribolium Flour beetles of the genus 
Tribolium have been the subject of a long series of experiments on nonlinear 
population dynamics (reviewed by Cushing et al. 2003). Tribolium lives in stored 
flour. In addition to feeding on the flour, adults and larvae cannibalize eggs, and 
adults cannibalize pupae. These interactions are the source of nonlinearity in the 
demography, and are captured in a three-stage (larvae, pupae, and adults) model. 
The projection matrix is 


0 0 bexp(—cein1 — Cean3) 
A[0, n] = | 1 — u 0 0 (10.39) 
0 exp(—Cpan3) 1 — Ha 


where b is the clutch size, Cea, Cej, and Cpa are rates of cannibalism (of eggs by 
adults, eggs by larvae, and pupae by adults, respectively), and u; and ua are larval 
and adult mortalities (the mortality of pupae, in these laboratory conditions, is 
effectively zero). Parameter values from an experiment reported by Costantino et al. 
(1997) 


b = 6.598 
Cea = 1.155 x 107? 
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Ce = 1.209 x 107? 
Cpa = 4.7 x 107° 
Ha = 7.729 x 1073 
ui = 2.055 x 107! 


produce a stable equilibrium 
n= 18.0 |. (10.40) 


The sensitivity of n is calculated using (10.16). However, the damage caused 
by Tribolium as a pest of stored grain products might well depend more on 
metabolism than on numbers. Emekci et al. (2001) estimated the metabolic 
rates of larvae, pupae, and adults as 9, 1, and 4.5 ul CO2 hol, respectively. 
We define the metabolic population size as Nm(t) = c'n(t) where c! = 
(9 1 4.5), and calculate the sensitivity and elasticity of Nn using (10.37) 
and (10.31). 

Figure 10.1 shows the elasticity of ñ and Nm to each of the parameters. The 
elasticities are diverse and perhaps counterintuitive. Increases in fecundity increase 
the equilibrium density of all stages; increases in the cannibalism of eggs by adults 
reduces the density of all stages. But increased cannibalism of pupae by adults 
increases the density of larvae and pupae, as does an increase in the mortality of 
adults. 


1 ( ) 0.2 (b) 
a ia 
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5 0.5 i g 0 
$ l- im | B -0.1 
s = yo T E o 
E oo 
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s HB Larvae £ 
A -1.5 E Pupae B -0.6 
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Fig. 10.1 Sensitivity analysis of equilibrium for the flour beetle Tribolium in Example 2. (a) The 
elasticity of the equilibrium n to the parameters (see Example 2 for definitions). (b) The elasticity 
of the equilibrium population respiration rate N, to the parameters 
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When the stages are weighted by their metabolic rate, the elasticity of Nn to 
fecundity is positive, but the elasticities to all the other parameters (cannibalism 
rates and mortalities) are negative. The positive effects of cp, and pa on n disappear 
when the stages are weighted according to metabolism. | 


10.2.6 Continuous-Time Models 


We have focused on discrete-time models throughout this book. An analogous 
perturbation analysis can be carried out on continuous-time models of the form 


dn 
— =A [n(t)] n(‘) (10.41) 
dt 
Verdy and Caswell (2008) present a parallel presentation of the continuous and 
discrete models. The linearization at n is, once again, given by (10.10). If all the 
eigenvalues of M have negative real parts, the equilibrium is locally stable. 
The sensitivity of the equilibrium n is 


dn aT dvec A)! aT dvec A 
a= {-4- (a @1,) oat | (a ®1,) eT (10.42) 


with A and all its derivatives evaluated at the equilibrium ñ. Substituting (10.10) for 
M gives 
dn 
do" 


dvec A 


sat (10.43) 


= —M"! (a" @ i) 


and M is nonsingular unless 0 is an eigenvalue of M, which corresponds to a 
bifurcation point of the equilibrium. 


10.3 Environmental Feedback Models 


Environmental (or economic) feedback models write the vital rates as functions of 
some environmental variable, which in turn depends on population density. Feed- 
back models may be static or dynamic. In static feedback models, the environment 
depends only on current conditions, with no inherent dynamics of its own. In 
dynamic feedback models, the environment can have dynamics as complicated as 
those of the population (e.g., if the environmental variable was the abundance of a 
prey species, affecting the dynamics of a predator species). The sensitivity analysis 
of dynamic feedback models is given in Sect. 10.8. 
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A static feedback model can be written 
n(¢ + 1) = A[6, n(z), g(r) ] ne) (10.44) 
g(t) = gl6, n(t)] (10.45) 


where g(t) is a vector (of dimension q x 1) describing the ecological or economic 
aspects of the environment on which the vital rates depend. As written here, the 
model admits the possibility that the vital rates in A might depend directly on n as 
well as on the environment. 

At equilibrium 


ñ = A[0, ñ, g]n (10.46) 


& = g[0, ñ]. (10.47) 


Differentiating these expressions gives 


dn = A(dn) + (dA)n (10.48) 
E ag og, 

dg = ——dé+ —dn. 10.49 
B= 70 + od (10.49) 


Applying the vec operator to (10.48) and expanding dvec A gives 


dvecA dvecA 
~ (aT A x 
di = (a at)| at Ot oer aa + Adi. (10.50) 


Substituting (10.49) for dg and rearranging gives 


dvecA dvecA dg 
~ (aT 
dû = (ñ at)| -e oe ao 


dvecA 0g 
AT o8 A 
+ [a + (a @ I,) aT a | dû. (10.51) 
Solving for dn and applying the identification theorem yields 
dn aT dvecA 0g =l 
do’ [i 2 (a = L) agt =| 
Z dvecA dvecA ð 
x (n@I + : 10.52 
eon) | ao ag! a tone 
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In this expansion, A, g, and all derivatives are evaluated at (ñ, g). A comparison 
of (10.52) with (10.16) shows that including the feedback mechanism has simply 
written dvec A/dn! and dvec A/do" in terms of g using the chain rule. 

The environmental variable g may be of interest in its own right (e.g., in the food 
ratio model of Lee and Tuljapurkar (2008), in which it is a measure of well-being, 
measured in terms of food per individual). The sensitivity of ĝ at equilibrium is 

a = a pê m (10.53) 
d0 30 ðn dO 


where d/d6" is given by (10.49) and (dii/d0") by (10.52). 


10.4 Subsidized Populations and Competition for Space 


A subsidized population is one in which new individuals are recruited from 
elsewhere rather than (or in addition to) being generated by local reproduction. 
Subsidy is important in many plant and animal populations, especially of benthic 
marine invertebrates and fish. Many of these species produce planktonic larvae that 
may disperse very long distances (Scheltema 1971) before they settle and become 
sessile for the rest of their lives. Thus a significant part—maybe even all—of the 
recruitment at any location is independent of local fertility (e.g., Almany et al. 
2007). Subsidized models have been used to analyze conservation programs in 
which captive-reared animals are released into a wild or re-established population 
(Sarrazin and Legendre 2000). They have been applied to the demography of human 
organizations; e.g., schools, businesses, learned societies (Gani 1963; Pollard 1968; 
Bartholomew 1982). They are also the basis of cohort-component population 
projections that include immigration. 

In the simplest subsidized models, both local demography and recruitment are 
density-independent. Alternatively, recruitment may depend on some resource (e.g., 
space) whose availability depends on the local population, or the local demography 
after settlement may be density-dependent. All three cases can lead to equilibrium 
populations. 


10.4.1 Density-Independent Subsidized Populations 


The model, 


n(t + 1) = A[0]n(t) + b[0], (10.54) 


214 10 Sensitivity Analysis of Nonlinear Demographic Models 


includes a subsidy vector b giving the input of individuals to the population. The 
parameters 0 may affect A or b, or both. If the fertility appearing in A is below 
replacement, so that A < 1, then a stable equilibrium fi exists.? This equilibrium 
satisfies 


f= AĤ +b (10.55) 
= (I; — A)! b. (10.56) 


Differentiating (10.55) and applying the vec operator yields 
d= (a" @ 1) dvec A + A (dû) + db (10.57) 


Solving for dn and applying the chain rule gives the sensitivity of the equilibrium, 


dn dvecA db 
eed 1 -Ay'|(aT ot tot. (10.58) 
do" i ) ( ) do’ a0! 


Example 3: The Australian Academy of Sciences Most human organizations are 
subsidized; recruits (new students in a school, new employees in a company) come 
from outside, not from local reproduction. In an early example of a subsidized 
population model, Pollard (1968) analyzed the age structure of the Australian 
Academy of Sciences, recruitment to which takes place by election. !? The Academy 
had been founded in 1954, and between 1955 and 1963 had elected about 6 new 
Fellows each year, with an age distribution (Pollard 1968, Table 2) given by 


Age Percent 


30-34 0.0 
35-39 12.2 
40-44 24.5 
45—49 26.5 
50-54 20.4 
55-59 4.1 
60-64 10.2 
65-69 2.0 


8The same model could describe harvest if b < 0 (e.g., Hauser et al. 2006). This form of harvest 
produces unstable equilibria, and is not considered further here. 

°lf A > 1, the population grows exponentially and the subsidy eventually becomes negligible. The 
equilibrium in this case is non-positive (and hence biologically irrelevant) and unstable. If à = 1 
then the population would remain constant in the absence of subsidy; any non-zero subsidy will 
then lead to unbounded population growth. 

!0Pollard’s paper is remarkable for its treatment of both deterministic and stochastic models, but 
here I consider only the deterministic case. 
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Pollard interpolated this distribution to 1-year age classes, and combined it with a 
1954 life table for Australian males (only one woman, the redoubtable geologist 
Dorothy Hill in 1956, had been elected to the Academy prior to 1969) to construct a 
model of the form (10.54). He calculated the equilibrium size and age composition 
of the Academy. Here, I have used the male life table for Australia 1953-1955 in 
Keyfitz and Flieger (1968, p. 558) to construct an age-classified matrix A with age- 
specific probabilities of survival P; on its subdiagonal and zeros elsewhere. Were 
these vital rates and the age distribution of the subsidy vector to remain constant, 
the Academy would reach an equilibrium size of Ñ = 149.5 with an age distribution 
n shown in Fig. 10.2a. 

As parameters, consider the age-specific mortality rates 4; = — log P;, and 


define the parameter vector 0 = ( Hı u2 onl Equation (10.58) then gives the 
sensitivity of the equilibrium population to changes in age-specific mortality. The 


a 
oO 


45 (a) -10 
> 
ea. = -20 
o 3.5 5 
£ g -30 
S 3 fe) 
= = -40 
§ 25 z 
= o -50 
S 2 2 
2 2 -60 
515 a 
1 8 -70 
0.5 -80 (b) 
0 -90 
20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 
Age Age 
> 0.05 
g (c) 
£ o anii, 
2 
Oo 
" 
£ -0.05 
K= 
xc 
2 
o -0.1 
a 
© 
ray 
Ss -0.15 
3 
= 
a 
YD 0.2 


N 
ò 
o 
ô 
A 
3 
a 
ro) 
D 
=] 
x 
o 
w 
io) 
© 
ro) 


Fig. 10.2 Analysis of the equilibrium of a linear subsidized model for the Australian Academy 
of Science, based on Pollard (1968). (a) The equilibrium age structure of the Academy, assuming 
recruitment of 6 members per year. (b) The sensitivity, to changes in age-specific mortality, of the 
number of members. (c) The sensitivity, to changes in age-specific mortality, of the proportion of 
members over 70 years old 
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sensitivity of the total size of the Academy, Ñ = 1A, calculated using (10.31), is 
shown in Fig. 10.2b. It shows that increases in mortality reduce N (not surprising), 
with the greatest effect coming from changes in mortality at ages 48-58. 

Because learned societies are often concerned with their age distributions, 
Pollard (1968) examined the proportion of members over age 70. At equilibrium, 
this proportion is R = 0.26. The sensitivity dR R/ do", calculated using (10.33), 
is shown in Fig. 10.2c. Increases in mortality before age 48 would increase the 
proportion of members over 70, while increases in mortality after age 48 would 
decrease it.!! a 


10.4.2 Linear Subsidized Models with Competition for Space 


Recruitment in subsidized populations may be limited by the availability of a 
resource. Roughgarden et al. (1985; see also Pascual and Caswell 1991) presented a 
model for a population of sessile organisms, such as barnacles, in which recruitment 
is limited by available space. Barnacles!* produce larvae that disperse in the 
plankton for several weeks before settling onto a rock surface or other suitable 
substrate, after which they no longer move. 

Roughgarden’s model supposes that settlement is proportional to the free space 
F(t). Thus the subsidy vector is 


bit) = (F(t) 0--- 0)", (10.59) 


where ¢ is the settlement rate per unit of free space, and is determined by the pool 
of available larvae. The free space is the difference between the total area A and the 
space occupied by the population, 


F(t) =A-—g'n(t) (10.60) 


where g is a vector of stage-specific basal areas. Suppose that all locally-produced 
larvae are advected away, so that the first row of A is zero. Then, substituting (10.60) 
into (10.59) and rearranging gives 


n(t-+ 1) = BnG) + (4 0-0)" (10.61) 


Ht is possible to calculate the average age of the Academy, and its sensitivity, using results to be 
introduced in Sect. 10.5.4. The response is very similar to that of the proportion over age 70. 
The temptation to draw analogies between barnacles and the members of learned academies is 
almost irresistible. 
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where 
—81 —$82 ++: — 8gs 
a2) AN +++ As 
B= i a : (10.62) 
s1 s2 ''** dss 


Although it includes competition for space, the model is linear. The equilibrium n 
of (10.61) is stable if the spectral radius of B is less than one.!? The formula (10.58) 
gives the sensitivity of this equilibrium to changes in the vital rates, the settlement 
rate, or the individual growth rate. This model might apply to any situation where 
the recruitment of new individuals depends on the availability of a resource (space, 
jobs, housing) that can be monopolized by residents. 


Example 4: Intertidal barnacles Gaines and Roughgarden (1985) modelled a 
population of the barnacle Balanus glandula in central California. In one site 
(denoted KLM in their paper), they reported age-independent survival with a 
probability of P; = 0.985 per week, i = 1,...,52. The growth in basal area 
of an individual barnacle could be described by g, = m(px)*, where x is 
age in weeks and p is the radial growth rate (0 = 0.0041 cm/wk). The mean 
settlement rate was @ = 0.107. The matrix B contains survival probabilities 
P; on the subdiagonal, terms of the form —@g; in the first row, and zeros 
elsewhere. 

The equilibrium population n has an exponential age distribution (Fig. 10.3a). It 
is scaled here relative to total area, so A = 1. The equilibrium proportion of free 
space is Ê = 0.865. 

To calculate sensitivities, let the parameters be age-specific survival probabilities, 
so that 0 = ( Pi -++ Ps2 ). Some of the possible sensitivities are shown in Fig. 10.3. 
Increasing survival at age j (ages j = 10, 20, 40 are shown) reduces the abundance 
of ages younger than j and increases the abundance of ages older than j (Fig. 10.3b). 
A perturbation to a parameter, call it £, that affects survival at all ages would have 
the effect 

dn dû d0 dn 


dë qdo'd& d0 


(10.63) 


where 1 is a vector of ones. An increase in overall survival would reduce the 
abundance of age classes 1—6 and increase the abundance of older age classes 
(Fig. 10.3c). 


'3Because B contains negative elements, its dominant eigenvalue may be complex or negative, 
leading to oscillatory approach to the equilibrium. 
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The sensitivity of n to the larval settlement rate @ is obtained from (10.58) by 
setting dvec B/dġ = 0 and 


s2x1> 


db A T 
P 


Not surprisingly, increases in ġ increase ñ, with the largest effect on the young age 

classes (Fig. 10.3d). The sensitivity of ñ to the radial growth rate p is obtained by 
writing 

dvec B dvec B dg 

do dg” dp 


(10.64) 


This sensitivity is negative, with the greatest impact on young age classes 
(Fig. 10.3e). 
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Fig. 10.3 Sensitivity analysis of a subsidized population of the intertidal barnacle Balanus 
glandula. (a) The equilibrium population n (scaled relative to a unit of area A = 1). (b) The 
sensitivity of bon toa change in survival at ages j = 10, 20, 40. (c) The sensitivity of n to changes 
in overall survival at all ages. (d) The sensitivity of ñ to the settlement rate @ per unit area. A 
sensitivity analysis of a subsidized population of the intertidal barnacle Balanus glandula. 
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Survival Settlement Growth 
Fig. 10.3 (continued) (e) The sensitivity of ñ to the radial growth rate p. (f) The sensitivity of the 


equilibrium free space F to age-specific survival. (g) The sensitivity of F to changes in overall 
survival, settlement rate, and radial growth rate. Based on data of Gaines and Roughgarden (1985) 


Finally, the sensitivity of the equilibrium free space is given by 


dF dÊ di df 
do’ dn'ag’ È got 


(10.65) 


Increases in survival reduce the amount of free space at equilibrium; the effect 
is largest for changes in survival of young age classes (Fig. 10.3f). Figure 10.3g 
compares the effect on F of changes in overall survival, settlement, and radial 
growth rate. It is not surprising that increases in survival or settlement will reduce 
free space, but perhaps surprising that increases in the radial growth rate actually 
increase Ê. E 


10.4.3 Density-Dependent Subsidized Models 


Once individuals arrive in the population, they may experience a variety of density- 
dependent effects, that can be incorporated in a model 


n(t + 1) =A[0, n(t)] nt) + b. (10.66) 
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The sensitivity result (10.58) applies to this model by substituting 


A A 
dvec do + avec da 


dvecA = TI nT 


(10.67) 


into (10.57) and solving for dñ, to obtain 


dû aT dvec A \ T! aT dvec A db 
do" - (1 i (a 21) ðn! ) [6 21) 3T dot 
(10.68) 


where A, b, and all derivatives of A and b are evaluated at ñ. 


10.5 Stable Structure and Reproductive Value 


The linear model n(t + 1) = An(f) will, if A is primitive, converge to a stable age or 
stage distribution. But while the dynamics of the population vector n(¢) are linear, 
the dynamics of the proportional population structure are nonlinear (Tuljapurkar 
1997). We can take advantage of this to analyze the sensitivity of proportional 
structures by writing them as equilibria of nonlinear maps. 


10.5.1 Stable Structure 


The sensitivity of the stable stage distribution has been approached as an eigenvector 
perturbation problem (e.g., Caswell 1982, 2001; Kirkland and Neumann 1994), but 
those calculations are complicated. Analysis of the equilibrium of the nonlinear 
model (10.69) is much easier. 

Let p denote the proportional stage structure vector (p > 0, 1'p = 1). The 
dynamics of p(t) satisfy 


p¢ + 1) = Ann (10.69) 
|AP@|| 
The stable stage distribution is an equilibrium of (10.69); it satisfies 
poe (10.70) 
p= ITAp : 


where the 1-norm can be replaced by 1 Ap because p is non-negative. Differentiat- 
ing both sides gives 
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dp = 1’ Ap(dA)p + 1' ApA(ap) — Ap! (dA)P — aiaa | 


l 
(1TA)? | 
(10.71) 


Note that Ap = Ap and 11A = A, where A is the dominant eigenvalue of A. 
Making these substitutions and applying the vec operator to both sides gives 


adp= | (6° 2 L) z (6 ® a) dvec A + [a = ôa] dp (10.72) 


Solving for dp and applying the chain rule gives 


dvecA 


dp _ (Al, -A+ pia) (6° @l,—p'@ 1”) T (10.73) 


do! 


Example 5: A human age distribution As an example, consider the age 
distribution of the population of the United States in 1985 (data from Keyfitz 
and Flieger 1990). These vital rates yield a declining population (A = 0.975) and an 
age distribution skewed towards older ages (Fig. 10.4). Applying (10.73) yields the 
sensitivity of p to age-specific survival probabilities P; and fertilities F;, where age 
classes i = 1,..., 18 correspond to ages 0-5, ..., 85—90. The overall patterns are 
familiar from previous sensitivity analyses of stable age distributions (e.g., Caswell 
2001, Figure 9.22). Increasing survival probability at a given age increases the 
relative abundance of the next several age classes, at the expense of younger and 
older age classes. Increasing fertility at a given age increases the abundance of 
young age classes at the expense of older age classes. | 


10.5.2 Reproductive Value 


A similar approach gives the sensitivity of the reproductive value vector v, given 
by the left eigenvector of A corresponding to A. Reproductive value is customarily 
scaled so that vı = 1. Scaled in this way, v satisfies 


aAA (10.74) 
v= Se k 
v! Ae; 


where e; is a vector with 1 in the first entry and zeros elsewhere. Differentiating 
both sides gives 


: 1 : he ew 
a [#7 Aer (d0 A+" Ae: 0TA) 
(v"Ae)) 


— (4 )Ae STA- (dA) 0A (10.75) 
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Fig. 10.4 Stable age distribution and sensitivity of stable age distribution to age-specific survival 
and fertility. (a) The stable age distribution. (b) The sensitivity of the stable age distribution to 
changes in survival (P5) in age class 5. (c) Sensitivity of the stable age distribution to changes in 
fertility (F5) in age class 5. Based on life table data for the United States in 1985 (Keyfitz and 
Flieger 1990) 


But TA = 40! and ¢’Ae; = A. Making these substitutions and applying the vec 
operator (remembering that vec vl =v) gives 


adv =| (I, @ 97) — (t7 @ 7) ]dvecA+(AT—eTAT) dv. (10.76) 


Solving for dv and using the chain rule gives 


L = (as -A7 + 8e7aT) [ (1, 097) - (vel @ 67)] i p 


In stable population theory, in the calculation of second derivatives of population 
growth rate (Shyu and Caswell 2014), and in the analysis of multitype branching 
processes for demographic stochasticity (Caswell and Vindenes 2018), it is neces- 
sary to use the sensitivity of v subject to the scaling 


viwe=l. (10.78) 
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The resulting derivative is 


dv 


= 
T = (al SAT avw") 


x ([0 = vw") 2 v"| -A (ve v") ah ) dvecA (19.79) 


dvec TA do" 


(see Caswell and Vindenes 2018 for derivation). 


10.5.3 Sensitivity of the Dependency Ratio 


The dependency ratio characterizes an age distribution by the relative abundance 
of two groups, one assumed to be dependent and the other productive (Keyfitz and 
Flieger 1990, p. 32). It is often assumed that persons younger than 15 or older than 
65 are dependent on productive individuals between 15 and 65. The dependency 
ratio is defined as 


(10.80) 


where a is a vector with ones for the dependent ages and zeros otherwise, and b is 
its complement. Applying Eq. (10.33) for the sensitivity of a ratio gives 


dD _ (b'pa'—a'pb')\ dp 
do" (b™p)” do" : 


(10.81) 


where dp/6' is given by (10.73). 

This result can be generalized in several ways. The analysis may be performed 
separately for the dependent young and the dependent old, by suitable modification 
of a and b. These two components are likely to be influenced by different 
demographic factors and can respond to perturbations in opposite directions. The 
0-1 vectors a and b may be replaced by vectors of weights; e.g., age-specific 
consumption and age-specific income (Fiirnkranz-Prskawetz and Sambt 2014). 
For an example applied to a population projection for Spain, see Caswell and 
Sanchez Gassen (2015). The analysis also applies to stage-classified models, 
provided that dependent and independent stages can be identified. It also applies 
to nonlinear models, with the stable stage distribution p replaced by the equilibrium 
population n in (10.81). It can be extended to transient dynamics, where the age 
distribution, and thus the dependency ratio, varies over time (Caswell 2007), as 
is the case in population projections (Caswell and Sanchez Gassen 2015). Finally, 
the sensitivity (10.81) makes it possible to carry out LTRE analyses to decompose 
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differences in dependency ratios into components due to differences in each of the 
vital rates (see Chaps. 2, 8, and 9). 


Example 5: (cont’d) Dependency ratios in human populations The United 
States in 1985 had a set of vital rates leading to a low growth rate (A = 0.975), and a 
relatively low dependency ratio, dominated by the old. Kuwait in 1970, in contrast, 
had a high growth rate (A = 1.210) and one of the highest dependency ratios listed 
in the compilation of Keyfitz and Flieger (1990), dominated by the young: 


U.S.A. 1985 Kuwait 1970 


D 0.668 1.025 
Dy 0.260 0.956 
Do 0.406 0.069 


where Dy and Do are the dependency ratios calculated for the young and old 
separately. The sensitivities of D, Dy, and Do to changes in age-specific survival 
and fertility are shown in Fig. 10.5. The responses of D to changes in the vital rates 
differ between the two countries. In the U.S., increases in fertility would reduce 
D. In Kuwait, increases in fertility (especially at young ages) would increase D. 
In the U.S., increases in survival!* before age 30 reduce D; increases after age 30 
increase D. In Kuwait, increases in survival, except at very young and very old ages, 
reduce D. 

Breaking D into its young and old components helps to explain these differences. 
In both countries, there is a crossover in survival effects. Increases in survival at 
early ages increase Dy and reduce Do. At later ages, increases in survival reduce 
Dy and increase Do. Increases in fertility increase Dy and reduce Do. In the U.S. 
population, both these effects are large, with the negative effect on Do larger than 
the positive effect on Dy. In the Kuwaiti population, the positive effect on Dy is 
much greater than the negative effect on Do. E 


10.5.4 Sensitivity of Mean Age and Related Quantities 


From an age distribution p, it is possible to compute the mean age of any age-specific 
property (e.g., production of children, collection of retirement benefits, exposure to 
toxic chemicals); see Chu (1998, p. 26) for general discussions. The most familiar 
of these is the mean age of reproduction, which is one measure of generation time 
(Coale 1972). 

Let f be a vector of age-specific per-capita fertilities. The age distribution of 
offspring production is then f o p, where o is the Hadamard, or element-by-element 
product. The mean age of the mothers of these offspring is obtained by normalizing 
f o p to sum to 1 and taking the mean over the resulting distribution, 


140r, equivalently, reductions in mortality. For these parameter values, the sensitivity to mortality 
is approximately the sensitivity to survival with the opposite sign. 
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Fig. 10.5 Sensitivity of the dependency ratio D, and of its old and young components, to age- 
specific survival and fertility. Left: calculated from the stable age distribution of the United States in 
1985. Right: calculated from the stable age distribution of Kuwait in 1970. (a) and (b): Sensitivity 
of D to survival (P;) and fertility (F;). (c) and (d): Sensitivity of the components of D to survival. 
(e) and (f): Sensitivity of the components of D to fertility. Life table data from Keyfitz and Flieger 
(1990) 
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(10.82) 


where 


with s as the last age class. 

Now differentiate ap, following the now-familiar rules for ratios. The differential 
of the Hadamard product of two vectors is d(a o b) = D (a)db + D (b)da. The 
result is 


dag _ 17 (fo Ô) cl —cl (fo Ô) 1! 
do’ (T)? 


where dp/do" is given by (10.73). 

This result can be generalized in several ways. Setting f = 1 makes the age- 
specific property that of simply being alive, and ay = c!1 is then the mean age of 
the stable population, the sensitivity of which is 


dp . df 
D (f) —— + D (p)—= 10. 
)( Oot DE) (10.83) 


da 7 ap 

oat c w (10.84) 
The calculations can also be applied to the equilibrium population in a nonlinear 
model by substituting n for p. They apply directly to stage-classified models with 
stages defined on an interval scale (e.g., size classes), in which case they give, e.g., 
the mean size at reproduction. If the stages are not evenly spaced, then c would be 
replaced by 


el = (x1 x2 --- x5) (10.85) 


where x; is the value associated with stage i. 


Example 5: (cont’d) Mean age of reproduction The mean age of reproduction in 
the stable age distribution of the United States in 1985 was ap = 24.02 years (using 
the mid-points of the 5-year age intervals as the measure of age). The sensitivities of 
ag to changes in age-specific survival and fertility are shown in Fig. 10.6. Increases 
in survival prior to age 15 reduce dg. Increases in survival after age 45 have almost 
no effect on dr, because fertility is essentially zero after this age. Between age 15 
and age 45, increases in survival increase the mean age of reproduction. 

Increases in fertility reduce ag if they happen before age 25 and increase ag if they 
happen after age 25. These sensitivities are quite large, although this is somewhat 
irrelevant since the largest sensitivities are for ages at which fertility is zero and 
unlikely to be modified. a 
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Fig. 10.6 Sensitivity of the mean age at reproduction to changes in age-specific survival and 


fertility, for the life table of the population of the United States, 1985. (Data from Keyfitz and 
Flieger 1990) 


10.5.5 Sensitivity of Variance in Age 


We can also calculate the sensitivity of the higher moments. For example, the 
variance in the age at reproduction is 


Ve = a2 — (G)?. (10.86) 
This variance might, for example, be useful as a measure of the extent of iteroparity. 


The sensitivity of Ve to changes in parameters is obtained by writing the first 
term as 


T A 
> coc)’ (f 
af = (coe i (10.87) 
17 (fo p) 
and then differentiating 
dVi=d (a) — 2ār (diz) . (10.88) 
The final result is 
dV [11 Œo peoo! — (coc)! (fop) 1" 
do! (p)? 
dp ~. df ) _ dag 
x | D O —= + D (p)—= | — 2a¢—. 10.89 
( Oar +P Wz) T (10.89) 


where dp/do" is given by (10.73) and dar/do" is given by (10.83). 
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10.6 Frequency-Dependent Two-Sex Models 


In sexually reproducing species, a particular sort of nonlinearity arises from the 
dependence of reproduction on the relative abundance of males and females. This 
dependence is captured in a marriage function or mating rule (e.g., McFarland 1972; 
Pollak 1987, 1990) When the vital rates depend only on the relative, rather than the 
absolute, abundance of males and females, then A[@, n] is homogeneous of degree 
0 in n; i.e., 


A[0, cn] = A[0, n] for any c Æ 0. (10.90) 


Such models have been called frequency-dependent (Caswell and Weeks 1986; 
Caswell 2001) to distinguish them from density-dependent nonlinear models that 
do not have this homogeneity property. 

Because of the homogeneity of A[0, n], frequency-dependent models do not 
converge to an equilibrium density ñ. Instead, there may exist!” a stable equilibrium 
proportional structure p to which the population will converge, at which point it 
grows exponentially at a rate A given by the dominant eigenvalue of A[0, Ð]. Thus 
sensitivity analysis of two-sex models must include both the population structure 
and the population growth rate. 

Note that matrix models that include Mendelian genetics are also homogeneous 
of degree zero, but it is confusing to call them frequency-dependent, because doing 
so creates confusion with the genetic phenomenon of frequency-dependent fitness, 
which is a different thing altogether (de Vries and Caswell 2018). 


10.6.1 Sensitivity of the Population Structure 


The equilibrium proportional population structure p satisfies 
RE (10.91) 
P 


where p; > 0 and 1'p = 1. Differentiating (10.91) gives 


ITAp [aap + Alap) | — AĴ [Taa p 4 1TA(ap)] 


dp = 
(17 Ap)” 


(10.92) 


‘SA sufficient, but not necessary, condition for the existence of an equilibrium is that A cannot 
map a nonzero vector n directly to zero; necessary conditions are more difficult (Nussbaum 1988, 
1989). See also Martcheva (1999). 
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Making the substitutions Ap = Ap and 1’ Ap = A and rearranging gives 

adp = (dA)p + A(dp) — p1' (dA)p — pl Aap). (10.93) 
Applying the vec operator to both sides, expanding dvec A, invoking the chain rule, 
and solving for dp/do" gives 

dp 

do" 


= [at -a+ ôA- |p" @ (I, - 61") | | i 


dvec A 
aT = A T Siena 
x E 2 (1, pl )| sar 094) 


where A and all derivatives are evaluated at p. Note that (10.94) differs from the 
expression (10.73) for the stable stage distribution in the linear model only in the 
term involving dvec A/ ap', which of course is zero in the linear model. 


10.6.2 Population Growth Rate in Two-Sex Models 


Because a population with the equilibrium structure grows exponentially, I once 
suggested treating A[0, p] as a constant matrix and applying eigenvalue sensitivity 
analysis to it, in order to examine life history evolution in 2-sex models (Caswell 
2001, p. 577). This was incorrect, because it ignored the effect of parameter changes 
on A through their effects on the equilibrium p. A correct calculation obtains the 
sensitivity of A including effects of parameters on both A and p. 

Note that p is a right eigenvector of A[0, p] corresponding to A. Let v be the 
corresponding left eigenvector, where vi Al, p] = av! and v'p = 1. Then 


di =v'(dA)p (10.95) 
Caswell (1978). Apply the vec operator and Roth’s theorem to get 
d= (' 2 v") dvec A. (10.96) 


Expanding dvec A gives 


di ð A a A dp 
(7 v1) | vec vec p | (10.97) 


Tar \P Bere 
do" ao! op’ ao! 
where A, v, and the derivatives of A are all evaluated at the equilibrium f, and 
dp/d6" is given by (10.94). 
Note that 4 is the invasion exponent for this model, and thus the sensitivity of 
à to a parameter gives the selection gradient on that parameter. Tuljapurkar et al. 
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Fig. 10.7 Life cycle graph 
for the 2-sex model for 
passerine birds (Legendre 
et al. 1999). Stages 1 and 2 
are juvenile and adult 
females; stages 3 and 4 are 
juvenile and adult males. 
Parameters are stage specific 
survival probabilities oj, 
stage-specific fertilities F;, 
and primary sex ratio 
(proportion female) p 


(2007) used this fact to explore the effect of male fertility patterns on the evolution 
of aging; the sensitivity (10.97) could be used to generalize such results. Recent 
work by Shyu has coupled these calculations to the methods of adaptive dynamics 
to examine the evolution of sex ratios (Shyu and Caswell 2016a,b). 

Although two-sex models are an important case of homogeneous models, they 
are not the only case. Keyfitz’s (1972) interpretation of the Easterlin hypothesis 
describes fertility as dependent on only the relative, not absolute, size of a cohort. A 
model based on this premise would be frequency-dependent (homogeneous) and 
would lead to an exponentially growing population to which (10.97) would be 
applicable. 


Example 6: A two-sex model for passerine birds Legendre et al. (1999) used a 
frequency-dependent two-sex model to study the introductions of passerine birds 
to New Zealand. The life cycle includes two age classes (first year and older) for 
females and for males. The life cycle graph is shown in Fig. 10.7. The numbers of 
females and males are Nf = nı + n2 and Nm = n3 + n4, respectively. 

Because passerines are typically monogamous within a breeding season, and 
assuming that mating is indiscriminate with respect to age, Legendre et al. (1999) 
used as a mating function 


Bin) = min (Nf, Nm). (10.98) 


giving the number of matings as a function of the number of males and females. The 
per-capita fertility of a female of age-class i is the number of matings divided by the 
number of females and multiplied by the number of surviving offspring per mating. 


F(n) = —_— (10.99) 
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Nin 
= obi N; Nf = Nm (10.100) 
oop N fs Nm 

where oo is the probability of survival from fledging to age | and @; is the clutch size 
of age class i. When males are the scarcer sex (the avian equivalent of a marriage 
squeeze) fertility is proportional to the ratio of males to females. When females are 
the scarcer sex, all females are mated and fertility depends only on fecundity and 
neonatal survival. 

Births are allocated to females and males according to a primary sex ratio p 
which gives the proportion female. The resulting two-sex projection matrix is 


pF) pFy(n) 0 0 

A{n] = = 72 e (10.101) 
d-pFm) -pm 0 0 
0 0 03 04 


Legendre et al. (1999) assigned typical values for passerine birds of o9 = 0.2, 
¢; = 7, and p = 0.5. They set male and female survival equal (0; = 03 = 0.35, 
o2 = 04 = 0.4), but this is a pathological special case in this model, so instead I 
consider two cases, one in which male mortality is higher than female mortality, and 
one in which the difference is reversed.!° The survival probabilities and equilibrium 
population structures are 


0.35 0.320 
0.5 [0.226 
= 2 10.102 
7 = | 0.25 P= | 0.320 (TADA 
0.4 0.134 
0.25 0.320 
„|04 p = | 0-134 do 
0.35 0.320 
0.5 0.226 


The elasticities of p to each of the parameters, calculated from (10.94), are 
shown in Table 10.1. Regardless of which sex is scarcer, increasing neonatal survival 
increases the proportion of young, at the expense of the proportion of adults, in both 
sexes. Increasing the sex ratio p increases the proportion of females at the expense 
of males. Increasing female survival (o; or o2) increases the proportion of adult 
females at the expense of all other stages; increasing male survival has the opposite 


'6Tn a survey of the literature, adult mortality for female passerines exceeded that for males in 21 
out of 28 cases (Promislow et al. 1992). Birds differ from mammals in this respect. 
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Table 10.1 Elasticity of p to parameters in two-sex model for passerine birds, under two mortality 
scenarios. When male mortality is greater than female mortality, males are rarer than females and 
fertility at equilibrium is limited by the mating function. When male mortality is less than female 
mortality, females are rare and fertility is not affected by the mating function 


Males rare 

Stage |oo p o1 o2 03 o4 Qı h2 

Pi 0.455 0.453 | —0.226 | —0.229 0.000 0.000 0.266 0.189 
Po —0.890 1.799 0.774 0.783 0.398 0.268 0.521 0.369 
P3 0.455 1.547 0.226 0.229 0.000 0.000 0.266 0.189 
Pa 0.664 0.428 0.226 0.229 0.669 0.450 | —0.389 | —0.275 
Females rare 

Stage | 00 p o1 o2 03 o4 Qı h2 

Pı 0.455 1.547 0.000 0.000 | —0.226 | —0.229 0.320 0.135 
p2 —0.664 0.428 0.669 0.450 0.226 0.229 0.467 0.197 
P3 0.455 | —0.453 0.000 0.000 | —0.226 | —0.229 0.320 0.135 
Pa 0.890 1.799 0.398 0.268 0.774 0.783 | —0.627 | —0.264 


effect. However, when females are rare, increasing female survival has no effect on 
the proportion of juveniles. When males are rare, increases in male survival have no 
effect on the proportion of juveniles. Increasing fecundity increases the proportion 
of juveniles, at the expense of adults, in both sexes and for either mortality 
pattern. 

The elasticity of the population growth rate A at equilibrium is shown in 
Table 10.2, and is compared to the naive calculation that treats A[0, p] as a fixed 
matrix. When males are rare, so that fertility is limited by the mating function, the 
naive calculations are dramatically wrong. When calculated correctly, increases in 
the primary sex ratio p reduce i, because they reduce the availability of males. 
Increases in female survival have no effect on A, because the extra females produced 
have no opportunity to reproduce. Increases in male survival increase 4 because they 
increase female fertility. In each case, the naive calculation leads, incorrectly, to the 
opposite conclusion. 

When females are rare (which renders the model linear and female-dominant at 
equilibrium), the correct and the naive calculations agree. This is a consequence 
of using the minimum as a birth function. Some preliminary calculations using the 
harmonic mean birth function, 


2NfNm 


Bm) = r, 
(n) Nf + Nm 


(10.104 


in which both males and females influence fertility at all population structures, 
suggest that the naive elasticity calculations are always incorrect. 

Sometimes the correct calculations lead to apparent paradoxes. Jenouvrier et al. 
(2010) developed a two-sex model for the Emperor penguin. It was a periodic model, 
with phases defined by events within the breeding cycle (cf. Chap. 8), and included a 
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Table 10.2 The elasticity of 
à to parameters in the 
two-sex model for passerine 


Males rare Females rare 


Correct | Naive | Correct | Naive 


birds, under two mortality oo | 0.669 | 0.545 | 0.669 | 0.669 
scenarios. The correct p |—0.669 | 0.545 | 0.669 | 0.669 
calculation is based o | 0 0.226 | 0.198 | 0.198 
on (10.97). The naive o | 0 0.229 | 0.133 | 0.133 
calculation incorrectly treats 

A[p, 0] as a fixed matrix, 93 0.198 | 0 0 0 
ignoring the effect of o4 | 0.133 | 0 0 0 
parameters on the equilibrium hı 0.392 | 0.319 | 0.471 | 0.471 
population structure p $2 | 0.277 | 0.226 | 0.198 | 0.198 


mating function applied to adults at the breeding colony. Because Emperor penguins 
breed, and share parental care, in the midst of the Antarctic winter, !” they must be 
strictly monogamous, and hence Jenouvrier used the minimum as a mating function. 

Analysis of the equilibrium growth rate revealed that the sensitivity of à to adult 
female survival was negative. This is impossible in a linear model, but happens in 
this frequency-dependent model because increasing adult female survival increases 
the proportion of females (already greater than the proportion of males) and thus 
decreases mating probability. The negative effect of reduced mating overwhelms the 
positive effect of improved adult survival; the net result is a reduction in population 
growth rate; see Jenouvrier et al. (2010) for details. a 


10.6.3 The Birth Matrix-Mating Rule Model 


Pollak (1987, 1990) introduced a powerful conceptual approach to two-sex models, 
which he called the birth matrix-mating rule (BMMR) model. This model separates 
the processes of mating, birth, and life cycle stage transitions, and treats them as 
a periodic process. When generalized to stage-structured models, it contains three 
main components: 


1. A birth matrix whose entries give the expected number of male and female 
offspring produced by a mating of a male of age (or stage) i and a female of 
age j. 

2. A mating rule function that gives the number of matings u;; between males of 
age (or stage) i and females of age j. 

3. A set of sex-specific mortality schedules, which project surviving individuals to 
the next age class, or, in our generalization, include other stage-specific life cycle 
transitions. 


Dramatically portrayed in the movie, March of the Penguins. 
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A matrix version of the BMMR has recently been developed, using a novel 
continuous-time formulation of periodic matrix models (Shyu and Caswell 2018). 
The mating, birth, and transition processes are described, respectively, by matrices 
U, B, and T. To explore the theoretical consequences of two-sex reproduction, the 
matrices are parameterized in terms of continuous-time rates rather than discrete- 
time probabilities. In continuous time, the periodic matrix product that would 
describe such a process in discrete time converges to a sum of the rate matrices. 
The dynamics of the population are given by 


dn(t) _ 
dt 


A [nO] ne) (10.105) 
where 
A [n] = : (T +B+ Uln()]) n(t) (10.106) 


That is, the projection matrix is the mean of the three component matrices, and is 
nonlinear because of the dependence of union formation (the matrix U) on n. Shyu 
and Caswell (2016a,b, 2018) explore this model in the context of sex ratio evolution 
and of sex-biased harvesting, deriving the sensitivity of the population growth rate 
as a measure of the selection gradient. 


10.7 Sensitivity of Population Cycles 


Equilibria are not the only attractors relevant in nature (e.g., Clutton-Brock et al. 
1997) or the laboratory (Cushing et al. 2003). Cycles, invariant loops, and strange 
attractors also occur, and are sensitive to changes in parameters. This section 
examines the sensitivity of cycles. 


10.7.1 Sensitivity of the Population Vector 


A k-cycle is a sequence of population vectors fy, ..., Nx, satisfying 
jy, = A [0, ñ; ] û; i=l,...,k—1 
hj) = A [0, ûz] Nk. (10.107) 


A change in parameters will modify each point in the cycle; the first goal of 
perturbation analysis is thus to find the sensitivities 
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dn dûk 
do" 


a 


apt 
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(10.108) 


The following is the derivation of these sensitivities for a 2-cycle. The extension to 
cycles of arbitrary length will follow. To simplify notation, define 


A; =A [9, ii; | : 
The 2-cycle satisfies 
ny = Aol 
ñz? = Ah 


(10.109) 


(10.110) 
(10.111) 


Differentiating both equations, applying the vec operator, and expanding 


dvec Aj /do" yields a system of equations 


ao" 


do" 


dn aT dvec A2 Ky 
= (àl ® 1) 307 + (àl ® L) 


dû aT dvec Ay aT 
anz (âi 29) oA (âi @ 


a, 

do" 

(r) 
do" 


(10.112) 


(10.113) 


This system can be written in block matrix form. Define H; = ay @I,;. Then 


dvec Ay 


d (mi\_ (0M ao! 
do' \tm -H 0 dvec A2 


ao! 


dvec A; 


0 Hp an} 
+ H, 0 0 dvec A2 


0 A2 
A10 


(10.114 
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Solving for the sensitivities gives 


dvec A} 
d hy \ b= 0 H, an! 
do} \in) 2s Hı 0 6 avec Ao 
an} 
=l dvec A; 
0 Az 0 Hp ao 
= 10.115 
(2 2) G 2) dvec Ag ( ) 
ao 


where the matrices A; and the derivatives of A; are all evaluated at n;. The analogy 
with (10.16) is apparent. 

This calculation can be extended to cycles of any period, in terms of block 
matrices as in (10.115). The pattern of the block matrices is clear from a 3-cycle. 
Define the following matrices: 


N=] tf) (10.116) 


(10.117) 


0 (10.119) 


0 0 H; 
H=|H, 0 0 (10.118) 
0 
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dvec A} 


ao" 
dvec A} 


aot 
dvec A} 


a0! 


(10.120) 


In terms of these matrices, the sensitivity of each point in the 3-cycle is given by 


dN 
do! 


= [b, — A — HC]"! HD. (10.121) 


10.7.2 Sensitivity of Weighted Densities and Time Averages 


The matrix dN Jao" contains the sensitivity of every stage to every parameter at 
every point in the cycle. This potential overload of information can be simplified 
by calculating the sensitivities of weighted densities and/or time averages over the 
cycle. To do this, it is convenient to write the points in the cycle as an array (of 
dimension s x k) 


G= (nj ñ2 --- Âk). (10.122) 
The block vector N is 
N = vec G. (10.123) 


Weighted densities. Let c be a vector of weights, and let Ñi = chy be the 
(scalar) weighted density at the ith point on the cycle. Then write 


a-|: (10.124) 


û = (Tñ ---¢ âe)” 
= vec (TG) 


= (i ® c") vec G 
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= (Ik 2 c") N dimension =k x 1. (10.125) 


Time-averaged population vector. Let b be a probability vector (b; > 0, 1"b = 
1) and define the time-averaged population vector as 


k 
ñ= X biñi. (10.126) 
i=l 
Then 
n= Gb 
= (b! ® L) vec G 
= (b! & 9) N dimension = s x 1 (10.127) 


Time-averaged weighted density. Taking the time average of the Ñi gives 
N = > bi ch; 
i 
=c'Gb 
= (b'g c)N (10.128) 
Thus the sensitivities of the weighted densities, the time-averaged popula- 


tion, and the time-averaged weighted density are obtained by differentiat- 
ing (10.125), (10.127), and (10.128) as 


dû +\ aN 
T= (u Dc ) T (10.129) 
dit z dN 
att (b @1,) “at (10.130) 
dN pt n aN 
att (b @e ) “at (10.131) 


where dN/o" is given by (10.121). 


Example 7 A 2-cycle in the Tribolium model A series of experiments on Tri- 
bolium reported by Dennis et al. (1995) produced stable 2-cycles by experimentally 
manipulating the adult mortality jg. Using the model in Example 2 and the 
estimated parameters 
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b = 11.677 
Cea = 1.100 x 107? 
Cel = 9.3 x 1073 


Cpa = 1.78 x 107? 
Ha = 1.108 x 107! 
ui = 5.129 x 107! 


(Dennis et al. 1995, Table 1) leads to a 2-cycle 


325.3 18.2 
nh = 8.9 fig = | 158.4 |, (10.132) 
118.5 106.4 


in which the population oscillates between a state dominated by larvae and adults 
and a state dominated by pupae and adults. 

As an example of the rich sensitivity analyses possible for even such a simple 
model, consider the elasticity of the population yain ñ;, of the total population 
Ñi = 1" fj, of the total population respiration Ri = chy (with c the vector of stage- 
specific respiration rates from Example 2), and of the time averages ñ, N, and R. 
The results are collected in Fig. 10.8. 

First, the elasticities of the ñ; differ from stage to stage and from one point on 
the cycle to another (Fig. 10.8a). Increases in fecundity, for example, increase the 
density of larvae and reduce the density of pupae in ny, but have the opposite effects 
in fiz. The elasticities to b, Cea, and Ce; are much larger than those to the other 
parameters (cf. the elasticities of the equilibrium Å in Fig. 10.1). 

The elasticities of total population are similar at the two points in the cycle 
(Fig. 10.8b), except that larval mortality u; has a large negative effect on Ñ, but 
only a small effect on M. The elasticities of total respiration R;, however, are 
different at the two points in the cycle (Fig. 10.8c). 

The elasticities of the time-averaged population vector n (Fig. 10.8d) are similar 
to those of the equilibrium vector in Fig. 10.1 (although they need not be). This 
pattern is not predictable from the patterns of the elasticities of the population 
vectors hy and fg (Fig. 10.8a). 

Finally, the elasticities of the time averages, N and R, of the weighted densities 
are similar to each other and to the elasticities of the time-averaged population n. 

The sensitivity analysis of cycles thus depends very much on the dependent 
variables of interest. The matrix dN /do" (Fig. 10.8a) contains 36 pieces of infor- 
mation: the effects of 6 parameters on 3 stages at 2 points in the cycle. A focus on 
weighted density reduces this to 12 (Fig. 10.8b,c), but the results may depend very 
much on the particular weighting vector chosen. A focus on time averages reduces 
the information from 36 to 18 numbers (Fig. 10.8d), and the response of the time- 
averaged weighted densities finally are described by just 6 numbers. The good news 
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Fig. 10.8 Analysis of a 2-cycle in the Tribolium model. (a) Elasticity of the density of each stage, 
with respect to each parameter, at n; and fz. (b) Elasticity of the total population N at each point 
in the cycle. (c) Elasticity of the total respiration R at each point in the cycle. (d) Elasticity of 
the time-averaged population n. (e) Elasticity of the time-averaged total population N and the 


time-averaged total respiration R 
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is that Eqs. (10.121), (10.125), (10.127), and (10.128) make it easy to compute all 
these sensitivities. a 


10.7.3 Sensitivity of Temporal Variance in Density 
The variance over a cycle in a weighted density N can be written 
* m n 72 
V(N) = E(N?) - [E>] (10.133) 


where E(N) = Ñ = c' Gb and 


k 

E(N?) =Y b; (cTa,) (10.134) 
i=l 

= (coc)'(GoG)b (10.135) 


Taking the differential of E (Ñ?) and applying the vec operator gives 
dE(N*) =2 |b" @ (coo)"| DN) dN. (10.136) 
Combining this with the differential of E (N J? gives the sensitivity of VIN ): 


dv (Ñ) 
do" 


=2{[b™ @ €oe)"| DW) -N(b" @c")| s (10.137) 


where dN/d 6" is given by (10.121). The extension to higher moments, should one 
want to know, say, the sensitivity of the skewness of population size over a cycle, is 
possible. 


10.7.4 Periodic Dynamics in Periodic Environments 


Periodic environments (e.g., seasons within a year) are described by periodic 
products of matrices. If the environmental cycle contains p phases, then matrices 
A1, . . . Ap describe the dynamics at each phase, and the periodic product A, --- Aj 
projects the population over an entire cycle. Nonlinear periodic models permit the 
A; to depend on the population vector at any point in the cycle, including delayed 
dependence (e.g., the reproductive success of an individual plant in the fall may 
depend on the density it experienced in the spring). A fixed point on the inter-annual 
time scale is a p-cycle on the seasonal time scale. A k-cycle on the inter-annual 
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scale corresponds to a kp-cycle on the seasonal time scale. The sensitivity analysis 
of these models is given by Caswell and Shyu (2012) and presented here in Chap. 8. 
For an application to the dynamics of an invasive plant population, see Shyu et al. 
(2013). 


10.8 Dynamic Environmental Feedback Models 


The commonly encountered forms of density dependence are usually a shorthand 
for a feedback between a population and some aspect of its environment.!* The 
static feedback model of Sect. 10.3 begins to incorporate environmental feedback, 
but assumed that the environmental variable g(t) had no inherent dynamics of its 
own. A more general, dynamic environmental feedback model can be written 


n(t + 1) = A[0, n(@), g(t) In(r) 
gt + 1) = BIO, n@), ggl) (10.138) 
allowing for n(t) to depend on both the environment and on its own density, and 
likewise for the environmental factor. 
The sensitivity of the equilibrium of (10.138) can be found using an approach 
similar to that applied above to cycles. At equilibrium, 
ñ = A[0, ñ, g]n (10.139) 
0, ñ, g]g (10.140) 


Differentiating both sides of each equation, expanding dvec A and dvec B, and 
applying the vec operator gives 


r A x dvec A oA oA. 

dit = A (dû) + (aT D I,) ( TO + gar tt adi) (10.141) 
x A aT avec B oB oB 

dg = B (dû) + (è S1) aT 0+ aT t gart) (10.142) 


Applying the identification theorem and the chain rule gives 


'8Early writers even interpreted the simple logistic equation as an interplay between a biotic 
potential for exponential growth and an environmental resistance due to lack of resources or 
interaction with predators (e.g., Chapman 1931). Incorporating a fully dynamic feedback greatly 
expands the range of phenomena that can be explained (see de Roos and Persson (2013) for an 
extensive development of this approach). 
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dn EEN dn N (ael ) dvec A + (i ) dvecA dû 
do! a0" *’ ao! ~ anl got 
A dvecA dg 
+(n@l == 
pe) dg’ aol 
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(10.143) 


with a similar expression for dg/d 0T. All matrices and their derivatives are 
evaluated at the equilibrium (ñ, g). This system can be written in block matrix form 


by defining 
H= (âT @ 1) 
J=(f o4) 
Then define 
Ro (3 5) 
0B 
H= (5 =) 
J0 
dvec B | dvec B 
ae ən! | dgt 
~ | ðvecA|ðvecA 
ən! | agl 
dvec A 
_ | ə 
1 | avec B 
ae! 
n=(3) 
Ê 
In terms of these matrices, 
dN dN 
— = HD + (A + HC) —. 
do" do" 


(10.144) 


(10.145) 


(10.146) 


(10.147) 


(10.148) 


(10.149) 


(10.150) 


(10.151) 


Solving for dN/d6‘r gives the sensitivity of both the population and the environ- 


mental factor, 
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dN 


-1 
a~ (Is4g - A- HC) HD. (10.152) 


10.9 Stage-Structured Epidemics 


The transmission of infectious diseases is a source of nonlinearity because the 
rate of transmission depends on the abundance of infected and non-infected 
individuals. When demographic structure is added to the picture, the models can 
become complicated because the transmission process, the recovery process, and 
the consequences of infection may all vary among age classes or stages. 

Klepac and Caswell (2011) developed a general framework for stage-classified 
epidemics, using the vec-permutation formulation (e.g., Chaps. 5 and 6). Individuals 
were jointly classified by stage and infection category, and nonlinearity was intro- 
duced by the disease transmission process. Klepac and Caswell (2011) calculated 
sensitivities and elasticities of equilibria and cycles of the stage x infection dis- 
tribution and, of stage-specific prevalence, to parameters specifying demographic, 
infection, and recovery processes. 

Coupling demography and epidemiology requires attention to time scales. 
Suppose that the demographic processes operate on one time scale: say, years. For 
some diseases, the infection/recovery process might operate on a much longer time 
scale (decades). Or the disease might play out on a much shorter time scale (weeks). 
When the disease time scale is shorter than the demographic time scale, the matrices 
in Klepac’s model that define disease transmission operate many times within a 
single year; the result is a periodic model on the infection time scale. See Klepac 
and Caswell (2011) for details. 


10.10 Moments of Longevity in Nonlinear Models 


The statistics of longevity (e.g., life expectancy) are traditionally calculated from 
linear age-classified models (see Chap. 4) or from linear stage-classified models (see 
Chap. 5). In a nonlinear model at equilibrium, the projection matrix is constant and 
an individual experiences a fixed schedule of vital rates, from which all the usual 
statistics of longevity can be calculated. Write the density-dependent projection 
matrix as 


A[0, n] = U[@, n] + F[0, n] (10.153) 


where U contains the transition probabilities for individuals already present in the 
population and F describes the production of new individuals by reproduction. The 
matrix U is the transient matrix of an absorbing Markov chain, with death as an 
absorbing state. The fundamental matrix of this chain at equilibrium is 
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NI0, fi] = (1, ~Ul6, al) (10.154) 


where the inverse is guaranteed to exist if the spectral radius of U is less than 1. 
The (i, j) element of N is the expected time spent in stage i, before death, by an 
individual in stage j. 

As in Chap. 4, the vector 9; containing the mean longevity of each age class or 
stage is given by 


n! = 1 NIAl. (10.155) 


The moments of longevity and other indices are calculated from N (0, fi) just as in 
the linear case. All the sensitivity results of Chaps. 4 and 5 apply directly, except that 
the derivative of N (0, fi) must include both the direct effects of 0 and the indirect 
effects through ñ. For convenience, write N and U for the matrices at equilibrium. 
Then 


dvecN = (NT D Ñ) dvec Ô (10.156) 


a7 _ «\ | dvecU dU. 
= (ses)| a dO + re (10.157) 


where U, N, and the derivatives of U are all evaluated at equilibrium and dn/ d0” is 
given by (10.16). Comparing this with equation (4.34) shows that the nonlinearity 
adds an extra term, capturing the way that changes in 0 affect the vital rates through 
changes in equilibrium density. 

This approach can be used to generalize the results for higher moments of 
longevity (Chaps. 4, 5, and 11) to the nonlinear case. 


10.11 Summary 


Table 10.3 lists the perturbation analysis results in this chapter; they comprise 
a fairly complete analysis for nonlinear demographic models. The nonlinearities 
may arise from density dependence, frequency dependence, environmental feed- 
back, proportional population structure calculations, or recruitment subsidy. The 
sensitivity calculations accommodate a wide range of dependent variables and the 
calculation of both sensitivity and elasticity with respect to any kind of demographic 
parameters. 

As in other chapters, most of the results in this chapter follow a straightforward 
method: 


1. Write the model, specifying the dependence of the vital rates on 0 and n. 
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2. Write a matrix expression for the demographic outcome of interest (e.g., the 
equilibrium population). 

. Differentiate this expression. 

4. Use the vec operator and Roth’s theorem to obtain an expression that involves 
only the differentials of vectors. 

5. Use the chain rule for total differentials to expand the operators (e.g., dvecA) 
that are functions of both 0 and n, as in (10.14). 

6. Use the first identification theorem and the chain rule to extend the results 
to sensitivities of any desired dependent variable with respect to any set of 
parameters 


Ow 
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Part V 
Markov Chains 


Chapter 11 A 
Sensitivity Analysis of Discrete Markov TRICA 
Chains 


11.1 Introduction 


As we have seen repeatedly, Markov chains are often used as mathematical models 
of demographic (as well as other natural) phenomena, with transition probabilities 
defined in terms of parameters that are of interest in the scientific question at 
hand. Sensitivity analysis is an important way to quantify the effects of changes 
in these parameters on the behavior of the chain. This chapter revisits, in a more 
rigorous way, some of the quantities already explored for absorbing Markov chains 
(Chaps. 4, 5, and 6). It will also consider ergodic Markov chains (in which no 
absorbing states exist), and calculate the sensitivity of the stationary distribution 
and measures of the rate of convergence. 

Perturbation (or sensitivity) analysis is a long-standing problem in the theory 
of Markov chains (Schweitzer 1968; Conlisk 1985; Golub and Meyer 1986; 
Funderlic and Meyer 1986; Seneta 1988, 1993; Meyer 1994; Cho and Meyer 2000; 
Mitrophanov 2003, 2005; Mitrophanov et al. 2005; Kirkland et al. 2008). When 
Markov chains are applied as models of physical, biological, or social systems, they 
are often defined as functions of parameters that have substantive meaning. 


Chapter 11 is modified, under the terms of a Journal Publishing Agreement with Elsevier 
Publishers, from: Caswell, H. Sensitivity analysis of discrete Markov chains via matrix calculus. 
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11.2 Absorbing Chains 


The transition matrix for a discrete-time absorbing chain can be written 


Ulo 
P= (4 r) (11.1) 


where U, of dimension s x s, is the transition matrix among the s transient states, 
and M, of dimension a x s, contains probabilities of transition from the transient 
states to the a absorbing states. Assume that the spectral radius of U is strictly less 
than 1. Because we are concerned here with absorption, but not what happens after, 
we ignore transitions among absorbing states; hence the identity matrix (a x a) in 
the lower right corner. The matrices U[0] and M[@] are functions of a vector of 
parameters. We assume that 0 varies over some set in which the column sums of P 
are | and the spectral radius of U is strictly less than one. 


11.2.1 Occupancy: Visits to Transient States 
Let v;; be the number of visits to transient state i, prior to absorption, by an 


individual starting in transient state j. The expectations of the v;; are entries of 
the fundamental matrix N = Nj = (E (n; p): 


N=(1-U)! (11.2) 
(e.g., Kemeny and Snell 1960; Iosifescu 1980). Let Ny = (za) be a matrix 


containing the kth moments about the origin of the v;;. The first several of these 
matrices are (Iosifescu 1980, Thm. 3.1) 


N; = d-U)! (11.3) 
N? = (2Nag — I) Ni (11.4) 
N3 = (6Njg — 6Nag + I) Ni (11.5) 
Na = (24Nj, — 36N3y + 14Nag — I) N1. (11.6) 


Theorem 11.2.1 Let Nx be the matrix of kth moments of the vij, as given by (11.3), 
(11.4), (11.5), and (11.6). The sensitivities of Ng, fork = 1, ...,4 are 


dvecNı = (NT @Ni) dvec U (11.7) 
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dvec No = |2 (18 Nag) — 1,2] dvec Ni +2 (NT @I) dvec Nag (11.8) 


dvec N3 = [1 Q (N3 — 6Nag + 1) | dvec N; 


+ [6 (N Nae @ 1) + 6 (NT 8 Nae) — 6 (NT 8 1) | dvec Nag (11.9) 
9 


dvec N4 = [18 (24N, — 36Nj, + 14Nag — I) | dvec N; 
+ [24 (NTN; @ I) + 24 (NTNag ® Nag) + 24 (NT 8 Nig) 


-36 (NTNag @1) —36 (NT 8 Nag) +14 (NT @1)| dvec Nag (1110) 


where (see Sect. 2.8) 
dNag = To dN, (11.11) 
dvec Nag = D (vec Idvec N}. (11.12) 


Proof The result (11.7) is derived in Caswell (2006, Section 3.1). For k > 1, and 
considering Nx as a function of N; and Nag, the total differential of Nx is 


dvec Nz dvec Nz 


dvec Nå = ————dVvec N; + ——-——— 
Sa dvec "Ny ae dvec "Nag 


dvec Nag. (11.13) 


The two terms of (11.13) are the partial differentials of vec Nz, obtained by taking 
differentials treating only N; or only Nag as variables, respectively. Denote these 
partial differentials as dy, On, and dy, i and ðn,- Differentiating N in (11.4), gives 


On, No = 2Nag (Ni) — dNi (11.14) 
nag N2 = 2 (dNag) Ni. (11.15) 


Applying the vec operator gives 
dy, vec N2 = È (LS Nag) = Iņ | dvec Ni (11.16) 
Nag VEC N2 = 2(N] 81) dvec Nag, (11.17) 
and (11.13) becomes 


dvec N> = [2 (I@ Nag) — Ie] dvecN, +2 (NT @ 1) dvecNag, (11.18) 


258 11 Sensitivity Analysis of Discrete Markov Chains 


which is (11.8). The derivations of dvec N3 and dvec N4 follow the same sequence 
of steps. The details are given in Appendix A. o 


The derivatives of No, N3, and N4 can be used to study the variance, standard 
deviation, coefficient of variation, skewness, and kurtosis of the number of visits to 
the transient states (Caswell 2006, 2009, 2011). 


11.2.2 Time to Absorption 


Let 7; be the time to absorption starting in transient state j and let ny = 


E (nk yoy nk ve The first several of these moments are (Iosifescu 1980, Thm. 3.2) 
n! =1'N; (11.19) 
m =n] Ni -D (11.20) 
nl =n! (eni — ON; + 1) (11.21) 
nl =a] (24N} — 36N? + 14N; — 1) (11.22) 


Theorem 11.2.2 Let n, be the vector of the kth moments of the ni. The sensitivities 
of these moment vectors are 


dm, = (18 1") dvec Ni (11.23) 
dm = (2NT — 1) dn, +2(1@ af) dvec N; (11.24) 
dn; = (6n? — 6N; + 1)" an, 

+[6(N on) +6(1811N1)-6(1877) |dveeNi (11.25) 
dns = (24N} — 36N? + 14N, — 1)" an, 

$ {24 [E 8 nl +24 (N] @ a{Ni) +24 (18 nN?) 

-36 (N] @ n1) -36 (18 n]N1) +14 (18 17) } dvec Ni (11.26) 


where dvec N, is given by (11.7). 
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Proof The derivative of 9; is obtained (Caswell 2006) by differentiating to get 
dyl = 1' (dN;) and then applying the vec operator. For the higher moments, 
consider the 9; to be functions of 7, and Nj, and write the total differential 


aNg ðN 
dn, = E dy, + —*— 
Mk= Syt avec TN; 


dvecN}. (11.27) 


The partial differentials of 7. with respect to 7, and Nj are 
ðn! = (ani) QNI -D (11.28) 
dy, 3 = 2q] (AN3). (11.29) 
Applying the vec operator gives 


ðM = (2NT - 1) dn (11.30) 


By to = 2(1@ m1) dvec N; (11.31) 


which combine according to (11.27) to yield (11.24). The derivations of dy3 and 
dy, follow the same sequence of steps; the details are shown in Appendix A. o 


11.2.3 Number of States Visited Before Absorption 


Let €; > 1 be the number of distinct transient states visited before absorption, and 
let €; = E(&). Then 


gla I'N; N: (11.32) 


(Iosifescu 1980, Sect. 3.2.5), where Ny = (Nag). 
Theorem 11.2.3 Let &; = E(&). The sensitivity of & is 


dé, = |- (NT ® 1") (Na @ NG) D (vec I) + (1 Q 1ng )| dvec N], 
where dvec N; is given by (11.7). (11.33) 
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Proof Differentiating (11.32) yields 
dé] =1" (an) Ni + ING dN}. (11.34) 
Applying the vec operator yields 
dé, = (Ni 2 1") dvecN! + (1 2 1N) dvec N). (11.35) 
Applying (2.82) to dvec Nay and using (11.12) for dvec Nag gives 


dé, = — (NI @1") (Nag © Ng ) P (vec Ddvec Ni + (18 1'Ng ) dvec Ni 
(11.36) 
which simplifies to (11.33). o 


11.2.44 Multiple Absorbing States and Probabilities 
of Absorption 


When the chain includes a > 1 absorbing states, the entry m;; of the a x s submatrix 
M in (11.1) is the probability of transition from transient state j to absorbing state 
i. The result of the competing risks of absorption is a set of probabilities bj; = 
P [absorption in i |starting in j] fori = 1,...,a and j = 1,...,s. The matrix 
B= (bij) = MN, (losifescu 1980, Thm. 3.3). 


Theorem 11.2.4 Let B = MN, be the matrix of absorption probabilities. Then 
dvec B = (Ni @ 1) dvecM + (Ni D B) dvec U. (11.37) 
Proof Differentiating B yields 
dB = (dM) Nı + M(dN}). (11.38) 
Applying the vec operator gives 
dvecB = (Nj @1) dvec M + (I & M) dvec N}. (11.39) 


Substituting (11.7) for dvec N, and simplifying gives (11.37). o 


Column j of B is the probability distribution of the eventual absorption state for 
an individual starting in transient state j. Usually a few of those starting states are 
of particular interest (e.g., states corresponding to “birth” or to the start of some 
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process). Let B(:, j) = Be; denote column j of B, where ej is the jth unit vector 
of length s. Thus the derivative of B(:, j) is 


dvec B(:, j) = (eT 2 I,) dvec B (11.40) 
where dvec B is given by (11.37). Similarly, row i of B is Bi, :) = e/B and 
dvecB(i, :) = (I, ® e) dvecB (11.41) 


where e; is the ith unit vector of length a. 


11.2.5 The Quasistationary Distribution 


The quasistationary distribution of an absorbing Markov chain gives the limiting 
probability distribution, over the set of transient states, of the state of an individual 
that has yet to be absorbed. Let w and v be the right and left eigenvectors associated 
with the dominant eigenvalue of U, normalized so that ||w|| = ||v|| = 1. Darroch 
and Seneta (1965) defined two quasistationary distributions in terms of w and v. The 
limiting probability distribution of the state of an individual, given that absorption 
has not yet happened, converges to 


qa =W (11.42) 


The limiting probability distribution of the state of an individual, given that 
absorption has not happened and will not happen for a long time, is 


wov 
q = 


11.43 
wie ( ) 


Horvitz and Tuljapurkar (2008) pointed out that the convergence to the quasista- 
tionary distribution implies that, in a stage-classified model, mortality eventually 
becomes independent of age. 


Lemma 1 Let the dominant eigenvalue of U, guaranteed real and nonnegative by 
the Perron-Frobenius theorem, satisfy 0 < à < 1, and let w and v be the right and 
left eigenvectors corresponding to i, scaled so that w'v = 1. Then 


dw = (Al, -U+ wiu) [w" 2 (1, = wi") ] dvecU (11.44) 


dv = (al, —U+ vel") | (r - vej) @v"]avecU (11.45) 
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Proof Equation (11.44) is proven in Caswell (2008, Section 6.1). Equation (11.45) 
is obtained by treating v as the right eigenvector of Ul, o 


Theorem 11.2.5 The derivative of the quasistationary distribution qa is given 
by (11.44). The derivative of the quasistationary distribution qp is 


dq = = [ (D (vy) — av") dw + (D (w) — aw") dy] (11.46) 


where dw and dv are given by (11.44) and (11.45) respectively. 


Proof The derivative of qa follows from its definition as the scaled right eigenvector 
of U. For qp, differentiating (11.43) gives 


É | (v'w) d (vow) — (vow) [(av") wiv! (aw) |} (11.47) 


= T [4 (vow) —qp (av") w= qv! (dw) | (11.48) 


Applying the vec operator gives 


m [D T T 
& = = (v)dw + D (w)dv — |w 8 qp) dv — qy dw (11.49) 
v'w 


which simplifies to give (11.46). o 


11.3 Life Lost Due to Mortality 


The approach here makes it easy to compute the sensitivity of a variety of dependent 
variables calculated from the Markov chain. As an example of this flexibility, 
consider a recently developed demographic index, the number of years of life lost 
due to mortality (Vaupel and Canudas Romo 2003). 

The transient states of the chains are age classes, absorption corresponds to death, 
and absorbing states correspond to age at death. Let u; be the mortality rate and 
pi = exp(—ui) the survival probability at age i. The matrix U has the p; on the 
subdiagonal and zeros elsewhere. The matrix M has 1 — p; on the diagonal and 
zeros elsewhere. Let f = B(:, 1) be the distribution of age at death and 7, the vector 
of expected longevity as a function of age. 

A death at age i represents the loss of some number of years of life beyond that 
age. The expectation of that loss is given by the ith entry of 9,, and the expected 
number of years lost over the distribution of age at death is nt = nif. This quantity 
also measures the disparity among individuals in longevity (Vaupel and Canudas 
Romo 2003). If everyone died at the identical age x, f would be a delta function at x 
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and further life expectancy at age x would be zero; their product would give n? = 0. 
Declines in discrepancy have accompanied increases in life expectancy observed in 
developed countries (Edwards and Tuljapurkar 2005; Wilmoth and Horiuchi 1999). 
Thus it is useful to know how 7‘ responds to changes in mortality. 

Differentiating n* gives 


dnt = (ani) Be; + 9! (dB) e1. (11.50) 
Applying the vec operator gives 
dn’ = e]b'adn] +(e] @ nt) dvec B. (11.51) 
Substituting (11.23) for dy, and (11.37) for dvec B gives 
dnt = a (1 ® 1") dvec Nı + (el ® ni) 
| (NT @ 1) dvec M + (NT 2 B) dvec v| (11.52) 
Simplifying and writing derivatives in terms of m gives 


om = [e (NT 8 n) + (EIN 8 mB) | 


dvec U Tut T\ dvecM 
iat + (e[N] @ 1) iat (11.53) 


Because mortality rates vary over several orders of magnitude with age, it is useful 
to present the results as elasticities, 


en’ 1 dnt 

an a aes D (mw). (11.54) 
Figure 11.1 shows these elasticities for two populations chosen to have very 
different life expectancies: India in 1961, with female life expectancy of 45 years 
and n? = 23.9 years and Japan in 2006, with female life expectancy of 86 years 
and nf = 10.1 years (Human Mortality Database 2016). In both cases, elasticities 
are positive from birth to some age (50 for India, ~85 for Japan) and negative 
thereafter. This implies that reductions in infant and early life mortality would 
reduce 7’, whereas reductions in old age mortality would increase n. Zhang and 
Vaupel (2009) have shown that the existence of such a critical age is a general 
property of these models. 
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Fig. 11.1 The elasticity of Japan 
mean years of life lost due to 0.02 
mortality, nt, to changes in > 
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11.4 Ergodic Chains 


Now let us consider perturbations of an ergodic finite-state Markov chain with an 
irreducible, primitive, column-stochastic transition matrix P of dimension s x s. 
The stationary distribution mw is given by the right eigenvector, scaled to sum to 1, 
corresponding to the dominant eigenvalue 4; = 1 of P. The fundamental matrix of 


the chain is Z = (I — P + x1") | (Kemeny and Snell 1960). 

We are interested only in perturbations that preserve the column-stochasticity 
of P; i.e., for which P remains a stochastic matrix. Such perturbations are easily 
defined when the p;; depend explicitly on a parameter vector 0. However, when 
the parameters of interest are the p;; themselves, an implicit parameterization must 
be defined to preserve the stochastic nature of P under perturbation (Conlisk 1985; 
Caswell 2001). In Sect. 11.4.5 we will explore new expressions for two different 
forms of implicit parameterization. 

Previous studies of perturbations of ergodic chains focus almost completely on 
perturbations of the stationary distribution, and are divided between those focusing 
on sensitivity as a derivative (e.g., Schweitzer 1968; Conlisk 1985; Golub and 
Meyer 1986) and studies focusing on perturbation bounds and condition numbers 
(Funderlic and Meyer 1986; Meyer 1994; Seneta 1988; Hunter 2005; Kirkland 
2003); for reviews see Cho and Meyer (2000) and Kirkland et al. (2008). The 
approach here is similar in spirit to that of Schweitzer (1968), Conlisk (1985), and 
Golub and Meyer (1986), in that we focus on derivatives of Markov chain properties 
with respect to parameter perturbations, but taking advantage of the matrix calculus 
approach. We do not consider perturbation bounds here. 
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11.4.1 The Stationary Distribution 


Theorem 11.4.1 Let a be the stationary distribution, satisfying Pa = m and 
1'x = 1. The sensitivity of 1 is 


dx = |z" 8 (z-x1")] dvecP (11.55) 


where Z is the fundamental matrix of the chain. 


Proof The vector m is the right eigenvector of P, scaled to sum to 1. 
Applying Lemma 1, and noting that A = 1 and ve = 1, gives dt = 
Z [xT ® (I, = x1")] dvec P. Noting that Zax = mx and simplifying the Kronecker 
products yields (11.55). o 


Based on an analysis of eigenvector sensitivity (Meyer and Stewart 1982), Golub 
and Meyer (1986) derived an expression for the derivative of x to a change in a 
single element of P using the group generalized inverse (I — P)* of I — P. Since 
(I- P)“ =Z-xl' (Golub and Meyer 1986), expression (11.55) is exactly the 
Golub-Meyer result expressed in matrix calculus notation. Our results here permit 
sensitivity analysis of functions of x using only the chain rule. If g(zr) is a vector- 
or scalar-valued function of m, then 


dg dn 
ae) = dn! dvec'P 


dvecP (11.56) 


Some examples will appear in Sect. 11.5. 


11.4.2 The Fundamental Matrix 


The fundamental matrix Z = (I -P+ ni")! plays arole in ergodic chains similar 
to that played by N; in absorbing chains (Kemeny and Snell 1960). It has been 
extended using generalized inverses (Meyer 1975; Kemeny 1981), but we do not 
consider those extensions here. 


Theorem 11.4.2 The sensitivity of the fundamental matrix is 
dvec Z = (z" 2 Z) {1,2 = [1x7 2 (z — xi") | dvec P (11.57) 
Proof From (2.82), 
dvecZ = - (Z" @Z) dvec (I-P+z1") (11.58) 


2 (27 2 Z) (avecP —~(@l,) dn) (11.59) 
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Substituting (11.55) for dx and simplifying gives (11.57). o 


11.4.3 The First Passage Time Matrix 


Let R = G j be the matrix of mean first passage times from j to i, given by 
Tosifescu (1980, Thm. 4.7). 


R= D(x)! (1-Z+ ZE). (11.60) 


Again, this is the transpose of the expression obtained when P is row-stochastic. 


Theorem 11.4.3 The sensitivity of R is 


dvecR = — [RT 9D (x)"] D (vecl,) (1 @1,) dx 


- [ [Le Dæ] - [E8 Dæ |P weet} dveeZ (11.61) 


where dx is given by (11.55) and dvec Z is given by (11.57). 
Proof Differentiating (11.60) gives 


dR =d |D æ] (1- Z+ Zagk) + De)! | -aZ + (dZag)E]. 011.62) 
Applying the vec operator gives 


dvecR = [ (I- Z + ZagE)' @ 1,] dvec [D (x)"] 


— [1s @D œ] dvecZ + [E 9D (x)"] dvec Zag. (11.63) 


Using (2.82) for dvec [D (x)~'], (2.69) for dvec D (x), and (11.12) for dvec Zag 
yields 


dvecR = — [R™D m) @ r] [D m @D (x)"] Dvecl) (1@ dx 


= [1 @D (x)"] dvec Z + [E 9D m] D (vec I) dvecZ (11.64) 


which simplifies to give (11.61). o 
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11.4.44 Mixing Time and the Kemeny Constant 


The mixing time K of a chain is the mean time required to get from a specified 
state to a state chosen at random from the stationary distribution x. Remarkably, K 
is independent of the starting state (Grinstead and Snell 2003; Hunter 2006) and is 
sometimes called Kemeny’s constant; it is a measure of the rate of convergence to 
stationarity, and is K = trace(Z) (Hunter 2006). In addition to being a quantity of 
interest in itself, the rate of convergence also plays a role in the sensitivity of the 
stationary distribution of ergodic chains (Hunter 2005; Mitrophanov 2005). 


Theorem 11.4.4 The sensitivity of K is 
dK = (vec I)! dvec Z. (11.65) 
Proof Differentiating K = trace(Z) gives 
dK =1' (lodZ)1. (11.66) 


Applying the vec operator gives 
dK = (1 2 1") D (vec Idvec Z (11.67) 


which simplifies to (11.65). oO 


11.4.5 Implicit Parameters and Compensation 


Theorems 11.4.1, 11.4.2, 11.4.3, and 11.4.4 are written in terms of dvec P. However, 
perturbation of any element, say pkj, to pkj + Okj, must be compensated for by 
adjustments of the other elements in column j so that the column sum remains 
equal to 1 (Conlisk 1985). Two kinds of compensation are likely to be of use 
in applications: additive and proportional. Additive compensation adjusts all the 
elements of the column by an equal amount, distributing the perturbation 0; 
additively over column j. Proportional compensation distributes 6,; in proportion 
to the values of the p;;, for i 4 k. Proportional compensation is attractive because 
it preserves the pattern of zero and non-zero elements within P. 

To develop the compensation formulae, let us start by considering a probability 
vector p, of dimension s x 1, with p; > 0 and X; pi = 1. Let 0; be the perturbation 
of pi, and write 


p(0) = p(0) + A0 (11.68) 
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for some matrix A to be determined. If y is a function of p, then 


dy dp 
y = —= — (11.69) 
dp! do" 
evaluated at 0 = 0. 
Additive compensation For the case of additive compensation, we write 
02 8 
Pi) = pı(0) +81 sia 
-1 s—1 
1 Os 
p2(0) = p2(0) + 62 
1 s—1 
(11.70) 
01 02 
Ps (8) = ps(0) +6, 
1 s-Il 


The perturbation 6; is added to pı and compensated for by subtracting 6) /(s — 1) 
from all other entries of p; clearly X; p;(@) = 1 for any perturbation vector 0. 
The system of Eqs. (11.70) can be written 


p(@) = p(0) + (1- <7) 0. (11.71) 


Defining E to be a matrix of ones, then the matrix C can be written (as a so-called 
Toeplitz matrix) as C = E — I, with zeros on the diagonal and ones elsewhere. Thus 
the matrix A in (11.68) is 


ic (11.72) 


Proportional compensation For proportional compensation, assume that p; < 1 
for all i. The vector p(@) is 


0 0 
pi) = pi(0) +01 = io 
— pm 1— ps 
0 0 
p2(0) = p0) — P+ 9)... — PE 
— Pl 1— ps 
(11.73) 
0 s0. 
AORTAL. A Se he 


l—p,; l-p 
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The perturbation 6 is added to pı and compensated for by subtracting 01 p;/(1— p1) 
from the ith entry of p. Again, >; pi(0) = 1 for any perturbation vector 8. 
Equation (11.73) can be written 


pO) = pO) + [1- D p) CDd-p)'|6 (11.74) 
so that the matrix A in (11.68) is 
A=I-—D(p)CD(—p)! (11.75) 


The transition matrix We have derived compensation formulae for a single 
probability vector p. Now consider perturbation of a probability matrix P, each 
column of which is a probability vector. Define a perturbation matrix © where 6;; 
is the perturbation of p;;. Perturbations of column j are to be compensated by a 
matrix A j, so that 


P(@) = P(0) + Exe 1) As@G, s)| (11.76) 


where A; compensates for the changes in column i of P. Applying the vec operator 
to (11.76) gives 


Ai 


vec P(Q) = vec P(O) + Ea vec O (11.77) 
As 


AY 
= vec P(0) + ) > (Ej; ® Aj) vec ©. (11.78) 
i=l 


The terms in the summation in (11.78) are recognizable as the vec of the product 
A; OE;;; thus 


Ss 
P(O) = PO) + È AOE; (11.79) 
i=l 
where E;; is a matrix with a 1 in the (7, i) entry and zeros elsewhere. 


Theorem 11.4.5 Let P be a column-stochastic s x s transition matrix. Let © be 
a matrix of perturbations, where 6;; is applied to pij, and the other entries of © 
compensate for the perturbation. Let C = E — I. If compensation is additive, then 


P(O) = P(O) + (1 = —c) oO (11.80) 
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dvec P | 


1 
= JI I C)|. 11.81 
dvec'@ est ds @ | ( ) 


If compensation is proportional, then 


P(O) = P(0) + y fi -D [P(,i)] CD [1 — PC, ar OE; (11.82) 
i=l 
Sie y [Ex 2D [PG D] CD [1 PC, i1 . (11.83) 
i=1 


dvec P 
dvec'@ 


Proof P(@) is given by (11.79). If compensation is additive, A; is given by (11.72) 
for all i. Substituting into (11.79) gives (11.80). Differentiating (11.80) and applying 
the vec operator gives (11.81). 

If compensation is proportional, substituting (11.75) for A; in (11.79) 
gives (11.82). Differentiating yields 


dP = (d0) XE; — > D[P(:, 1)] C D[1 — PC, i)) 7! (dO)E;;. (11.84) 
i=l i=1 
Using the vec operator gives (11.83). o 


Perturbations of P subject to compensation are given by perturbations of ©. Thus 
for any function y(P) we can write 


dy _ dy dvecP 
dvec TP comp ~ dvec'™P dvec'® 


(11.85) 


where dvec P/dvec'® is given (for additive and proportional compensation) by 
Theorem 11.4.5. The slight notational complexity is worthwhile for clarifying how 
to use Theorem 11.4.5 in practice. 


11.5 Species Succession in a Marine Community 


Markov chains are used by ecologists as models of species replacement (succession) 
in ecological communities; (e.g., Horn 1975; Hill et al. 2004; Nelis and Wootton 
2010). In these models, the state of a point on a landscape is given by the species 
occupying that point. The entry p;; of P is the probability that species j is replaced 
by species i between ¢ and t+ 1. If a community consists of a large number of points 
independently subject to the transition probabilities in P, the stationary distribution 
x will give the relative frequencies of species in the community at equilibrium. 
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Hill et al. (2004) used a Markov chain to describe a community of encrusting 
organisms occupying rock surfaces at 30-35 m depth in the Gulf of Maine. The 
Markov chain contained 14 species plus an additional state (“bare rock”) for 
unoccupied substrate. The matrix P was estimated from longitudinal data (Hill et al. 
2002, 2004) and is given, along with a list of species names, in Appendix B. We 
will use the results of this chapter to analyze the sensitivity of species diversity 
and the Kemeny constant to the processes of colonization and replacement that 
determing P. 


11.5.1 Biotic Diversity 


The stationary distribution x, with the species numbered in order of decreasing 
abundance and bare rock placed at the end as state 15, is shown in Fig. 11.2. The 
two dominant species are an encrusting sponge (called Hymedesmia) and a bryozoan 
(Crisia). 

The entropy of this stationary distribution, H(7) = —n' (log mx), where the 
logarithm is applied elementwise, is used as an index of biodiversity; it is maximal 
when all species are equally abundant and goes to 0 in a community dominated by 
a single species. The sensitivity of H is 


dH =-— (log nl ++ 1") dr (11.86) 


Most ecologists, however, would not include bare substrate in a measure of 
biodiversity, so we define instead a “biotic diversity” Hp(x) = H (mxp) where 


PE ci (11.87) 
° — Gz] 

Fig. 11.2 The stationary 0.35 aaam 
distribution for the subtidal 
benthic community 0.3} 1 
succession model of Hill 7 
et al. (2004). States 1-14 © 0.25f 1 
correspond to species, 2 
numbered in decreasing order 2 0.2} 
of abundance in the stationary > 
distribution. State 15 is bare £ di ] 
rock, unoccupied by any 3 oal | 
species. For the identity of om 
species and the transition 0.05+ 
matrix, see Appendix B 
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The matrix G, of dimension 14 x 15, is a 0-1 matrix that selects rows 1-14 of m. 


Because z is positive, ||Ga || = 1'Ga. Differentiating mp gives 
G Gr1'G 
dr, = | — Z | dx (11.88) 
1' Gr (1G) 


which simplifies to 


This model contains no explicit parameters; perturbations of the transition 
probabilities themselves are of interest and a compensation pattern is needed. 
Because the relative magnitudes of the entries in a column of P reflect the relative 
abilities of species to capture or to hold space, proportional compensation is 
appropriate in this case because it preserves these relative abilities. 

The sensitivity and elasticity of the biotic diversity Hp to changes in the matrix 
P, subject to proportional compensation, are 


dH, dH, dnp dn dvec P 
= (11.90) 
dvec TP comp dx} dx" dvec'P dvec'® 
i 3 
ae l d tP) (11.91) 
= — ec : 
evec TP H, dvec TP i 


comp 


Term | on the right hand side of (11.90) is the derivative of H, with respect to 
xp, and is given by (11.86). Term 2 is the derivative of the biotic diversity vector 
mxp With respect to the full diversity vector m, given by (11.89). Term 3 is the 
derivative of the diversity vector m with respect to the transition matrix P, given 
by, (11.55). Finally, Term 4 is the derivative of the matrix P taking into account the 
compensation structure in (11.83). 

The sensitivity and elasticity vectors (11.90) and (11.91) are of dimension 
1 x s$? = 1 x 255. To reduce the number of independent perturbations, we 
consider subsets of the p;j: disturbance (in which a species is replaced by bare 
rock), colonization of unoccupied space, replacement of one species by another, 
and persistence of a species in its location, where 


P[disturbance of sp. i] = psi 
P{[colonization by sp. i] = pis 


P{persistence of sp. i] = pii 
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Fig. 11.3 The elasticity of the biotic diversity Hp(mx) calculated over the biotic states of the 
stationary distribution of the subtidal benthic community succession model of Hill et al. (2004). 
States 1-14 correspond to species, numbered in decreasing order of abundance in the stationary 
distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the 
transition matrix, see Appendix B 


P{[replacement of sp. i] = 2. Pki 
kHi,s 


P[replacement by sp. i] = > Pij- 
j+i,s 


Extracting the corresponding elements of g eli gives the elasticities to these 
classes of probabilities. Figure 11.3 shows that the dominant species (1 and 2) 
have impacts that are larger than, and opposite in sign to, those of the remaining 
species. Biodiversity would be enhanced by increasing the disturbance of, or the 
replacement of, species 1 and 2, and reduced by increasing the rates of colonization 
by, persistence of, or replacement by species 1 and 2. 


11.5.2 The Kemeny Constant and Ecological Mixing 


Ecologists have used several measures of the rate of convergence of communities 
modelled by Markov chains, including the damping ratio and Dobrushin’s coef- 
ficient of ergodicity (Hill et al. 2004). The Kemeny constant K is an interesting 
addition to this list; it gives the expected time to get from any initial state to 
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Fig. 11.4 The sensitivity of the Kemeny constant K of the subtidal benthic community succession 
model of Hill et al. (2004). States 1-14 correspond to species, numbered in decreasing order of 
abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species. For the 
identity of species and the transition matrix, see Appendix B 


a state selected at random from the stationary distribution (Hunter 2006). Once 
reaching that state, the behavior of the chain and the stationary process are 
indistinguishable. 

The sensitivity of K, subject to compensation, is 


dK dK dvecZ dvecP 


dvec TP comp ~ dvecTZ dvec™P dvecTO 


(11.92) 


where the three terms on the right hand side are given by (11.65), (11.57), 
and (11.83), respectively. 

Figure 11.4 shows the sensitivities dK /dvec TP, subject to proportional 
compensation, and aggregated as in Fig.11.3. Unlike the case with Hp, the 
two dominant species do not stand out from the others. Increases in the rates 
of replacement will speed up convergence, and increases in persistence will 
slow convergence. The disturbance of, colonization by, persistence of, and 
replacement of species 6 (it is a sea anemone, Urticina crassicornis) have 
particularly large impacts on K. Examination of row 6 and column 6 of P 
(Appendix B) shows that U. crassicornis has the highest probability of persistence 
(pss = 0.86), and one of the lowest rates of disturbance, in the community. 
While it is far from dominant (Fig. 11.2), it has a major impact on the rate 
of mixing. 
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11.6 Discussion 


Given that many properties of finite state Markov chains can be expressed as 
simple matrix expressions, matrix calculus is an attractive approach to finding 
the sensitivity and elasticity to parameter perturbations. Most of the literature on 
perturbation analysis of Markov chains has focused on the stationary distribution 
of ergodic chains, but the approach here is equally applicable to absorbing chains, 
and to dependent variables other than the stationary distribution. The perturbation 
of ergodic chains is often studied using generalized inverses, since the influential 
studies of Meyer (Meyer 1975, 1994; Golub and Meyer 1986; Funderlic and Meyer 
1986). Matrix calculus provides a complementary approach; the sensitivity of the 
stationary distribution x obtained here agrees with the result obtained by Golub and 
Meyer (1986) using the group generalized inverse. 

The examples shown here are typical of cases where absorbing or ergodic 
Markov chains are used in population biology and ecology. In each example, the 
dependent variables of interest are functions several steps removed from the chain 
itself. The ease with which one can differentiate such functions is a particularly 
attractive property of the matrix calculus approach. 


A Appendix A: Proofs 


Theorems 11.2.1 and 11.2.2 give the sensitivities of the moments of the number of 
visits to transient states and of the time to absorption, respectively. These results are 
obtained by applying matrix calculus to the expressions for the moments. Proofs are 
given in the text for the first two moments; the proofs for the others follow the same 
steps but introduce no new concepts, and so are presented here. 


A.l Derivatives of the Moments of Occupancy Times 


To continue the proof of Theorem 11.2.1, take partial differentials of N3 in (11.5) 
with respect to Nj and Nag, to obtain 


@y,N3 = (6N, — 6Nag + I) AN; (11.93) 
ðn N3 = 6 (dNag) NugNi + 6Nag (dNag) Ni — 6 (dNag) Ni (11.94) 


Applying the vec operator to each term and using Roth’s theorem gives 


dy, vec N3 = [19 (6Njy — 6Nag + I) | dvec N; (11.95) 
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xy vec N3 = [6 (NTNag @ 1) + 6 (NT ® Nag) — 6 (NT @1)] 
dvec Nag. (11.96) 


Substituting (11.95) and (11.96) into (11.13) gives (11.9). 
Taking partial differentials of N4 in (11.6) gives 


dy, Na = (24N, — 36Niy + 14Nag — 1) dN, (11.97) 
drag Na = 24 (dNag) NjgNi + 24Nag (dNag) NagNi + 24NG, (dNag) Ni 
—36 (dNag) NagNi — 36Nag (dNag) Ni + 14(dNag) Ni. (11.98) 
Applying the vec operator yields 
ay, vec N4 = [1 2 (24N}, — 36N}, + 14Nag — 1)| dvec Nı (11.99) 
xy vec N4 = [24 (NIN ® 1) + 24 (NĪ Nae ® Nag) +24 (NT 8 Nie) 
-36 (N]Nez 8 I) — 36 (NT @ Nav) 
+14 (NT 8 1) | dvec Nas- (11.100) 


Substituting (11.99) and (11.100) into (11.13) gives (11.10). 


A.2 Derivatives of the Moments of Time to Absorption 


To continue the proof of Theorem 11.2.2, take partial differentials of 93, in (11.21) 
with respect to 7, and Nj, to obtain 


dnmg = (an1 ) (6NF — ON: +1) (11.101) 
On, n! = 6N! (AN1) + 64N; (AN1) — 61! (AN1). (11.102) 
Applying the vec operator yields 
E 
ð, N3 = (ni — 6N; +1) dy, (11.103) 
ay, = [6 (NT 811) +6 (18 n]Ni) — 6 (18 p7) | dvec N: (11.104) 


which combine to yield (11.25). 
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Applying the vec operator to each equation gives 


ən] = dnt (24N} — 36N? + 14N; — 1) 


The partial differentials of 74 in (11.22) with respect to 7; and Nj are 


ninl = n! [24 (dN1) Nj + 24N; (dN1) Nj + 24Nj (ANj) 


— 36 (dN,) N; — 36N; (dN) + 14dNi | 


3 a) I 
ð n4 = (24N} — 36N? + 14N; — 1) dn, 


ðn, 4 = {24 |(s?)" @ it | 424 (NT 2 MINI) +24 (I 2 niNt) 


which combine to give (11.26). 
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(11.105) 


(11.106) 


(11.107) 


— 36 (Ni 2 nl) ~ 36 (I 2 MIN: ) +14 (I 2 ni) | dvecN, (11.108) 


B Appendix B: Marine Community Matrix 


Model states Species type State ID Number 
1 Hymedesmia | sp. Sponge HY1 14875 
2 Crisia eburnea Bryozoan CRI 9915 
3 Myxilla fimbriata Sponge MYX 4525 
4 Mycale lingua Sponge MYC 3001 
5 Filograna implexa Polychaete FIL 2219 
6 Urticina crassicornis Sea anemone URT 992 
7 Ascidia callosa Ascidian ASC 1052 
8 Aplidium pallidum Ascidian APL 1166 
9 Hymedesmia 2 sp. Sponge HY2 1226 
10 Idmidronea atlantica Bryozoan IDM 730 
11 Coralline Algae Encrusting algae COR 875 
12 Metridium senile Sea anemone MET 1298 
13 Parasmittina jeffreysi Bryozoan PAR 402 
14 Spirorbis spirorbis Polychaete SPI 225 
15 Bare Rock BR 4266 
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The transition matrix for the marine benthic community (Hill et al. 2004) is 


P= 


0.771 0.145 0.052 0.017 0.117 0.009 0.241 0.199 0.056 0.309 0.056 0.025 0.321 0.158 0.101 
0.102 0.609 0.061 0.054 0.218 0.024 0.223 0.235 0.147 0.228 0.222 0.068 0.179 0.448 0.320 
0.017 0.031 0.710 0.006 0.035 0.012 0.051 0.038 0.026 0.031 0.028 0.018 0.023 0.018 0.025 
0.004 0.011 0.004 0.839 0.004 0.000 0.016 0.018 0.011 0.010 0.008 0.030 0.000 0.018 0.009 
0.015 0.028 0.020 0.005 0.404 0.016 0.080 0.089 0.020 0.027 0.036 0.016 0.063 0.085 0.062 
0.001 0.005 0.004 0.000 0.008 0.863 0.024 0.007 0.006 0.006 0.000 0.000 0.000 0.006 0.005 
0.018 0.022 0.008 0.004 0.033 0.001 0.105 0.044 0.011 0.042 0.025 0.010 0.030 0.030 0.048 
0.012 0.025 0.008 0.006 0.032 0.007 0.041 0.154 0.026 0.031 0.020 0.016 0.020 0.018 0.034 
0.002 0.011 0.025 0.008 0.013 0.016 0.014 0.015 0.586 0.010 0.007 0.004 0.003 0.018 0.013 
0.014 0.015 0.003 0.004 0.007 0.003 0.033 0.027 0.021 0.165 0.007 0.003 0.020 0.030 0.031 
0.003 0.012 0.005 0.006 0.006 0.004 0.025 0.016 0.006 0.013 0.507 0.001 0.017 0.006 0.017 
0.002 0.008 0.007 0.011 0.005 0.007 0.005 0.020 0.005 0.008 0.002 0.537 0.000 0.006 0.017 
0.005 0.005 0.002 0.000 0.006 0.000 0.014 0.009 0.001 0.012 0.005 0.003 0.248 0.000 0.011 
0.003 0.004 0.008 0.003 0.005 0.000 0.012 0.009 0.005 0.006 0.003 0.003 0.000 0.030 0.013 
0.029 0.069 0.084 0.036 0.108 0.036 0.115 0.122 0.074 0.104 0.076 0.266 0.076 0.127 0.294 


(11.109) 
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Chapter 12 A 
Sensitivity Analysis of Continuous TRICA 
Markov Chains 


12.1 Introduction 


When Markov chains are used as mathematical models of natural or social 
phenomena, the transition intensities or probabilities are usually defined in terms of 
parameters that are relevant to the scientific question at hand. Sensitivity analysis 
of such models is important because it quantifies the dependence of the model 
behavior on the parameters. This chapter presents sensitivity results for finite-state, 
continuous-time absorbing Markov chains, paralleling the approach for discrete- 
time chains in Chap. 11. In absorbing chains, interest focuses on behavior prior 
to absorption (time spent in transient states and time to absorption) and on the 
probabilities of absorption in each absorbing state. Here we will derive formulae 
for the sensitivity and the elasticity (i.e., proportional sensitivity) of the moments 
of the time to absorption, the time spent in each transient state, and the number of 
visits to each transient state. 

The most basic difference between discrete-time and continuous-time Markov 
chains is that the former are defined by transition probabilities, while the latter are 
defined by transition rates. This leads to differences in the structure of the matrices, 
but there is a nice parallelism in the results. 

Perturbation analysis of Markov chains has a long history (Schweitzer 1968; 
Meyer 1975). Most of the literature, however, is devoted to discrete-time chains, 
and most of that focuses on ergodic chains and the perturbation analysis of the 
stationary distribution; e.g. Funderlic and Meyer (1986), Golub and Meyer (1986), 
Hunter (2005), Cho and Meyer (2000), and Seneta (1993). Much less attention has 
been paid to continuous-time chains. Perturbation expansions have been developed 
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for the stationary distribution of ergodic continuous-time chains, with application 
to queueing models (Altman et al. 2004), and sensitivity results and perturbation 
bounds presented for transient solutions (Ramesh and Trivedi 1993; Mitrophanov 
2004). The operations research literature contains many studies of the sensitivity 
of performance measures calculated over realizations of a continuous-time ergodic 
Markov chain; e.g., Cao (1989), Glasserman (1992), and Cao et al. (1996). The 
results to be presented here complement and extend the existing literature on 
perturbation analysis of Markov chains, by focusing on the statistical properties of 
the solutions of absorbing continuous-time chains, by introducing the use of matrix 
calculus, and (as a consequence of that technique) extending the range of parameters 
whose effects can be evaluated. 


12.1.1 Absorbing Markov Chains 


I consider a finite state, homogeneous, continuous-time Markov chain with intensity 
matrix Q, where qij is the rate of transition from stage j to stage i. The intensity 
matrix satisfies q;; > Ofori Æ j and qj; = — eer qij- Note that Q is written 
in column-to-row orientation, and operates on column vectors. An absorbing chain 
contains at least one absorbing class of states. Numbering the states so that the 
transient states appear before the absorbing states leads to the intensity matrix 


Ulo 
Q= E e). (12.1) 


The matrix U contains rates of transitions among the transient states, and M contains 
the rates of transition from transient to absorbing states. 

I assume that U and M are differentiable functions of a vector 0 of parameters, 
and that Q[0] remains an intensity matrix for sufficiently small perturbations of 6. 
This includes as a special case the situation where the elements of 6 are simply 
some or all of the qij, i # j. The goal of the perturbation analysis is to obtain the 
derivatives of properties of the chain with respect to 0. 


12.2 Occupancy Time in Transient States 


Let s be the number of transient states, and v;; be the time spent in transient state i by 
an individual starting in transient state j. Define Nx = E (v) as the matrix whose 


entries are the kth moments, and Nag = (N 1)ag- The matrix N; of expectations is 
the fundamental matrix of the chain. The first several moments of occupancy times 
are given by the entries of the matrices 


N; = —U"! (12.2) 
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N2 = 2NagNi (12.3) 
N3 = 6NG,Ni (12.4) 
Ng = 24N3,Ni (12.5) 
and, in general, by 
Nx = KNagNx-1 k>2 (12.6) 


(Iosifescu 1980, Thm. 8.7). 
The differentials of the moments (12.2), (12.3), (12.4), and (12.5) are 


dvecNı = (NĪ @N1) dvec U (12.7) 
dvec Ny = 2 | (NT 2 1) D (vec I + (18 Nee) (NT 2 Ni) dvecU (12.8) 
dvec N3 = 6 {2 (NT 2 Nae) D (vec I) + (1 2 Nie) } (Ni 2 Ni) dvec U 
(12.9) 
dvecNs = 24{3(N] Nj.) D (veel) + (18 N) } (NT @Ni) dvecU 
(12.10) 
where I = I, throughout. A recursive relation for all the moments is 


dvec Ng =k (Ni ® 1) D (vec DdvecN+k (I Q Nag) dvec Nx_1 k>2. 
(12.11) 


The variance, standard deviation, and coefficient of variation of the v;; are 
important in applications; they are 


V (vj) =N2 -N1 oN; (12.12) 
CV (vij) = D (vec N1)™! vec SD (vij) (12.14) 


where the square root is taken elementwise. Their derivatives are 


dvec V =2 [ (NT ® 1) D (vec I) + (I ® Nag) — D (vec N)| dvec N; 
(12.15) 


1 1 
dvec SD = =D [vec SD (vis) | dvec V (12.16) 
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dvecCV =D (vec N,)~! dvec SD 
= [vec SD)" D (vecN)! 9D (veeNi)~"] 


xD (vecI,2) (1,2 8 I,2) dvec Ny (12.17) 


(suppressing the arguments of V, SD and CV). Because N; usually contains zeros, 
D (vec N;)~! must be restricted to the non-zero entries; the coefficient of variation 
is undefined if the mean is zero. 


Derivation The fundamental matrix Nj = —U~!. Applying (2.82) yields (12.7). 
The derivatives of the higher moments are obtained by differentiating N2 — N4 
in (12.3), (12.4), and (12.5). For example, the differential of N4 is 


dN, = 24 [3Nig (dNag) Ni + Ni, (aN,)| (12.18) 


using the fact that Nag commutes with itself and dNag. Applying the vec operator 
gives 


dvecNs = 24 [3 (NJ © Nig) dvecNag + (Is @N3,)dveeNi}. 1219 
Substituting (11.12) for dvecNag and (12.7) for dvecN; gives (12.10). 


Results (12.8) and (12.9) are obtained in similar fashion. 
Differentiating the recurrence relationship (12.6) gives 


dNx = k (dNag) Ne—1 + 8Nag (dNg-1) - (12.20) 
Apply the vec operator, 
dvec N; = k (Ni ® I,) dvec Nag + k (L ® Nag) dvec Nx_1, (12.21) 


and substitute (11.12) for dvec Nag to obtain (12.11). 
The derivative of V in (12.15) comes from differentiating (12.12), 


dV =dN2 — 2N; odNi, (12.22) 

applying the vec operator, 
Dvec V = dvec Nz — 2D (vec N;) dvec N}, (12.23) 
and then using (12.7) and (12.8). The derivative of SD (vij) in (12.16) follows 


from (2.83). The derivative of CV (vij) in (12.17) is obtained using (2.84), with 
x = vec SD and y = vec N}. 
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12.3 Longevity: Time to Absorption 


Let 7; be the time to absorption for an individual currently in transient state j. The 
vectors of the kth moments of the time to absorption, ng, satisfy 


n! =1"N, (12.24) 
ny = DI'N? (12.25) 
1! = ©1'N} (12.26) 
ny = (24)1™N} (12.27) 
and in general 
nl =k ;Ni k22 (12.28) 


(losifescu 1980, Thm. 8.6) 
The variance, standard deviation, and coefficient of variation of the time to 
absorption are 


Vm) =m- mom (12.29) 
SD (n) = VV m) (12.30) 
CV m) =D (SD) m (12.31) 


with the square root taken elementwise. 
The derivatives of the moments in (12.24), (12.25), (12.26), and (12.27) are given 
by 


dm = (NT @ nf) avec U (12.32) 
an={2|(Nf) ea] +2(NTentni)|aveeu a233) 
an = {6| (NT) ort| +6[(NT) © nm | 

+3(NT @ 93N1) | west (12.34) 


dng = 124 | (st): 2 3 +24 OD ® nN | 


+12 [E ® ni: +4(NT@ nini)| dvecU (12.35) 


286 12 Sensitivity Analysis of Continuous Markov Chains 
and, recursively, 
dy = KN] dy +k (1s © ngi) dvec Ni. (12.36) 


The derivatives of the variance, standard deviation, and coefficient of variation of 
the time to absorption are (suppressing the arguments) 


dV =2 {| (s7): 8 nl +(N] enN:) -P (m) (NT 2 nt) dvec U 


(12.37) 
dSD = 5D (SD)! dV (12.38) 
dCV =D (m) 'dsD-|SD™D (m) 8P (nm) "| 

xD (vec Is) (1s Q Is) dy. (12.39) 


Derivation Differentiating (12.24) for the expected time to absorption gives 
dy} =11dN,, (12.40) 


Applying the vec operator, substituting (12.7) for dvecN;, and simplifying 
gives (12.32). The derivatives of the higher moments are obtained in the same 
way; e.g., for 4, 


dnl = (2417 | (dN1) N? + Ny (dN,) N? +N? (AN1) Ni +N? aNı)| 


(12.41) 
Applying the vec operator yields 
3 2 
dy, = 24 | [E S "| ¥ [E S 1n | +[N o n] 
n [1s 2 1ni] | dvec N1. (12.42) 


Substituting (12.7) for dvecN; and simplifying using Eqs. (12.24), (12.25), 
and (12.26) gives (12.35). The derivatives of the second and third moments, (12.33) 
and (12.34), are obtained in similar fashion. 
The recursive formula (12.36) is obtained by differentiating (12.28) 
dng = k (dng_,) Ni + kong_yAN1. (12.43) 
Apply the vec operator, 
dn, = KNidm_, +k (1, 2 mi) dvec Nj, (12.44) 


substitute (12.7) for dvec N;, and simplify, to obtain (12.36). 
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Differentiating (12.29) for the variance yields 
dV = dq — 21; odyy. (12.45) 
Applying the vec operator gives 
dV = dq — 2D (nı) dn. (12.46) 
Substituting (12.32) for dyn, and (12.33) for dy, gives the result (12.37). The 
derivatives of the standard deviation, in (12.38), and the coefficient of variation, 


in (12.39), are obtained by differentiating (12.30) and (12.31) and applying (2.83) 
and (2.84). 


12.4 Multiple Absorbing States and Probabilities of 
Absorption 


Consider a chain that includes a > 1 absorbing states. The entry m;; of the a x s 
submatrix M in (12.1) is the rate of transition from transient state j to absorbing 
state i. The probabilities of absorption are defined as 
bij =P [absorption in i |starting in j] : (12.47) 
The a x s matrix B = (bij) is 
B= MN, (12.48) 
(losifescu 1980, Section 8.5.6). Column j of B is the probability distribution of the 
eventual absorption state for an individual starting in transient state j. Usually a few 
starting states are of particular interest (e.g., states corresponding to “birth’’). Let 
BC, j) = Be; denote column j of B, where e; is the jth unit vector of length s. 


Then 


dB(:, j) = (e} @ I,) dvec B. (12.49) 
Similarly, row i of B is BG, :) = elB and 
dvec B(i, :) = (I, 2 e1) dvecB (12.50) 


where e; is the ith unit vector of length a. The derivative of B in (12.49) 
and (12.50) is 


dvecB = (NT 2 1) dvecM + (NI 2 B) dvec U. (12.51) 
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Derivations Differentiating (12.48) yields 
dB = (dM)Ni + M(dN}). (12.52) 
Applying the vec operator and simplifying gives 
dvec B = (Ni 81) dvec M + (I & M) dvec N; (12.53) 


Substituting (12.7) for dvec N, and simplifying gives (12.51). 


12.5 The Embedded Chain: Discrete Transitions Within a 
Continuous Process 


If a continuous-time chain is observed only at the moments when it changes state, 
the result is a discrete-time process called the embedded Markov chain, or the jump 
chain, associated with Q (Iosifescu 1980, Section 8.3.2). The transition matrix of 
this embedded chain can be written 


~ (00 
P= (|— (12.54) 
MI, 
where 
Ü = L — UU; (12.55) 
M= —MU;,. (12.56) 


The embedded chain provides information on the number of visits to each transient 
state, rather than the time spent in each transient state. The expected numbers of 
such visits are given by the fundamental matrix 


Ñ =(1-6)". (12.57) 
The sensitivity analysis of the embedded chain follows directly from the discrete- 


time results in previous chapters (Chaps. 4 and 5). 
In particular, the differential of N; is Caswell (2006) 


dvec Ñ; = (NI 2 Ni) dvec U. (12.58) 
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However, this derivative is unlikely to be the sensitivity we are looking for. The 
continuous-time chain is likely to be parameterized in terms of the rate matrices U 
and M, rather than the probability matrices U and M. To express the perturbation 
analysis of P in terms of the parameters of Q requires the derivatives of the 
embedded chain with respect to the continuous chain; i.e., 


dvec U dvec M 
— and — —. 
dvec TU dvec 'M 


These derivatives are 
dvec U = [- (Uz 2 1) J (Uy. 2 UU;,) D (vec 1,)| dvecU (12.59) 
dvecM = — (Uy. 2 I.) dvec M (12.60) 
+ (I, 8 M) (Uz 8 Ug!) x D (vee Is) dvec U. 


Using (12.59) and (12.61), one can write 


dvec Ñ; _ (AT ~ ) dvecÙ dvecU (12.61) 
ao? t 1) dvec TU do! f 
Derivation Differentiate 0 in (12.55), 
r -1 -1 
dU = — (dU) Ugy — U(aU,), (12.62) 


apply the vec operator, and use (2.82) and (11.12) for dvec Ua: The result is 
U= A erl I Fz 
dvecU = — (Ux) 8 I, | dvec U — ( s ® U) dvec Ug, 


=e (Ux @ L) dvec U + (I, @ U) (Ug, 2 Ug) D (vec I,)dvec U 


which simplifies to give (12.59). Similarly, differentiating Min (12.56) and applying 
the vec operator gives 


dvecM = — (Ug) @ I.) dvec M — (I; ® M) dvec Uy: (12.63) 


Using (2.82) and (11.12) for dvec Uz and simplifying gives (12.61). 
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12.6 An Example: A Model of Disease Progression 


An important area of application of continuous-time Markov chains is the modelling 
of transitions among disease states. In this context, the time to absorption is 
longevity, and the time spent in various transient states has implications for the 
quality of life during the disease. Fix and Neyman (1951) introduced the idea and 
proposed a 4-state model for cancer, with two transient states (under treatment or 
not) and two absorbing states (death from cancer or from other causes). Kay (1986) 
proposed a model with k disease states and an absorbing state representing death. 
There is now a large literature on such models and their estimation. Recently, studies 
have proliferated that use Markov chain models of disease transmission to explore 
the cost-effectiveness of screening and treatment procedures (e.g., Kuo et al. 1999; 
Chen et al. 1999; Wu et al. 2006; Sonnenberg and Beck 1993). 

Sensitivity analysis reveals how these demographic properties respond to 
changes in parameters. As an example, I consider a model for the progression 
of colorectal cancer (CRC) that was developed to study the cost-effectiveness of a 
new CRC screening technique based on DNA testing of stool samples (Wu et al. 
2006). The model includes 7 transient states (normal, small and large adenoma, 
early and late preclinical CRC, and early and late clinical CRC) and 2 absorbing 
states (death from CRC and death from other causes); see Fig. 12.1. Parameters 
were estimated from the literature and from clinical studies in Taiwan. 

This model, which describes the so-called natural history of the disease, was 
embedded in a larger decision model to compare the cost-effectiveness of screening 
strategies. The intensity matrix (12.1) corresponding to Fig. 12.1 is 


wD £D ye 

Small Large Preclinical Preclinical 

E EDE ano ED a 
Ma yD 


6 Clinical 


7 Clinical 


Early CRC Late CRC 


Fig. 12.1 State transition diagram for an absorbing Markov chain model of colorectal cancer 
(CRC) progression. The model includes 7 transient states based on the stage of development of 
adenoma (polyps) or cancer, and two absorbing states corresponding to death from CRC and death 
from other causes (OCD). Transition rates are given by A;, and mortality rate from other causes by 
u. (Modified, under the terms of a Creative Commons Attribution License, from Figure 1 of Wu 
et al. 2006) 
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—A;-—p 0 0 
À1 —àÀ2 -u 0 
0 à2 —A3— 
0 0 A3 
Q= 0 0 0 
0 0 0 
0 0 0 
0 0 0 
H H H 


0 0 

0 0 

0 0 
—h4g—A5—-— u 0 

4 —r6— 

A5 0 

0 6 

0 0 
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0 0 00 
0 0 00 
0 0 00 
0 0 00 
0 0 00 
—àÀ7 - u 0 00 
0 —àÀg—- u00 
Aq Ag 00 
u u 00 
(12.64) 


The A; are transition rates; u is the mortality rate from other causes of death. The 
incidence rate of small adenoma (A,) and the mortality rate due to other causes 
of death (u) are age-dependent. Here I have analyzed values for age 70; based 
on figures in Wu et al. (2006). This leads to a parameter vector (all rates are per 


year): 
1.52 x 107? 
3.46 x 107? 
—2 
M 2.15 x 10 
i 3.70 x 107! 
0= | : |= | 2.38 x107! (12.65) 
Ag 4.85 x 107! 
H 3.02 x 1072 
2.10 x 107! 
2.20 x 1072 
12.6.1 Sensitivity Results 
The fundamental matrix (12.2) is 
2.9 0 00 0 0 0 
7.21717 0 00 00 
5.7140230 0 0 0 0 
Ni=] 0.2 0.5 0816 0 0 Of. (12.66) 
0.1 0.4 0.61.22.0 0 0 
0.9 2.2 3.67.2 019.2 0 
0.3 0.7 1.22.4 4.1 0.00 4.3 


Thus, given these rates, a 70-year old normal condition individual would expect 
to spend 27 years in stage 1, and only 0.9 and 0.3 years in stages 6 and 7 (early 
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and late clinical CRC).! Individuals in more advanced stages can expect to spend 
progressively longer periods in stages 6 and 7 (compare across rows 6 and 7 of N1). 
The standard deviations (12.13) of the times spent in the transient states are 


2.9 0 0 0 0 
14.2177 0 0 0 
15.2 21.223.0 0 0 
SD(vjj)=| 0.8 11 14 16 0 
0.7 1.1 14 182.0 0 
5.8 8.9 11.2 15.0 019.2 
1.6 24 3.0 3943 04. 


oooco 


0 
0 
0 
o|. (12.67) 
0 
0 
3 


Clearly, considerable variation can be expected in the times spent in the various 
states; the standard deviation equals or exceeds the mean in every case. 

Considering the sensitivity analysis of the time spent in transient states, focus on 
the fate of a normal (state 1) individual. The expected times spent in each state by 
such an individual are give by N,(:, 1). From (12.7) and (2.55) the sensitivity and 
elasticity of N(:, 1) are 


—722.6 0 0 0 0 0 0 0 —722.6 
280.9 —127.5 0 0 0 0 0 0 —321.6 
dN\(:, 1) 223.4 64.5 —132.0 0 0 0 0 0 —387.8 
ar = 7.6 2,2, 4.6 —0.3 —0.3 0 0 0 -13.5 
5.6 1.6 3.4 0.2 —0.2 —0.3 0 0 -10.2 
34.8 10.0 210-14 2.3 0 -17.1 0 —79.0 
11.6 3.4 7.0 0.3 —0.5 0 0 —1.3 —22.5 
—0.4 0 0 0 0 0 0 0 —0.6 
0.6 —0.6 0 0 0 0 0 0 -1.0 
0.6 0.4 —0.5 0 0 0 0 0 -1.5 
ENi(:, 1) 
—— F 0.6 0.4 0.5 —0.6 —0.4 0 0 0-15]. (12.68) 
e0 0.6 04 0.5 0.4 —0.4 —1.0 0 0 -1.5 


0.6 04 0.5 —0.6 0.6 0 —0.6 0 -1.9 
06 04 05 0.4 -0.4 0.0 0 —0.9 —1.7 


These elasticities imply that a 1% increase in 1 will (to first order) cause about 
a 0.4% decrease in the mean time spent in the normal state and a 0.6% increase in 
the mean time spent in each other state. A 1% increase in A4 (the rate of transition 
between early and late preclinical CRC) creates a 0.6% decrease in the time spent 


'This calculation holds the mortality rate fixed at its values at age 70; in reality it increases with 
age. Wu et al. (2006) included age variation by providing values of A (the rate of progression 
from normal to small adenoma) specific to 5-year intervals from 50 to 70 years of age; all other 
parameters were age-invariant. 
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in stages 4 and 6 (the early CRC stages) and a 0.4% increase in the time spent in 
stages 5 and 7 (the late CRC stages). An increase in the mortality rate u due to other 
causes of death reduces the time spent in any of the transient states. 

The elasticity of the variance in the time spent in the transient states by an 
individual in state 1 is 


—0.8 0 0 0 0 0 0 0 -—1.2 
0.4 —1.2 0 0 0 0 0 0 -1.2 
0.5 0.3 —1.0 (0) 0 0 0 0 -1.8 
eV (vii) 
— = 0.5 0.4 0.5 —1.2 —0.8 0 0 0-1.5 |. (12.69) 
e0 0.6 04 0.5 0.4-—0.4 -1.9 0 0 —1.6 


0.6 0.4 0.5—0.6 0.6 0 —1.2 0 —2.3 
0.6 04 05 04-04 0.0 0 —1.8 —1.7 


The sign pattern is the same as that of the elasticities of the mean times in (12.68), 
so we conclude that any parameter change that increases the mean time spent in 
a transient state will also increase the variance in that time. The elasticities of the 
variance are comparable to those of the mean (cf. (12.68) and (12.69)), showing that 
the means and the variance respond with roughly equal proportional changes. 

Longevity is measured by the time to absorption, and is a primary concern in 
analyses of screening or treatment protocols. The vectors of the mean, standard 
deviation, and coefficient of variation of longevity are 


41.4 37.4 0.9 
35.5 30.3 0.9 
29.1 25.8 0.9 
nı = | 12.4 SD(y) = | 14.1 CVm))=|11f. (12.70) 
6.1 4.7 0.8 
19.2 19.2 1.0 
4.3 4.3 1.0 


The sensitivity and elasticity of expected longevity (life expectancy) with respect to 
0 are 


158.7 —45.8 —96.0 —1.2 1.3 —0.2 —17.1 1.3 —1557.2 


112.2 —234.9 —3.0 3.2 —0.6 —41.9 —3.2 —1089.1 
dn 0 —384.2 —5.0 5.3 —1.0 —68.6 —5.2 —756.5 
et = 0 —10.0 10.7 —2.1 —138.8 —10.4 —176.0 


0 0 0-3.5 0 —17.8 —29.8 
0 0 0 0 —367.0 0 —367.0 
0 0 0 0 0-186 —18.6 


coocoocmc 
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0.06 —0.04 —0.05 —0.01 0.01 —0.00 —0.01 —0.01 —0.83 
0 —0.11 —0.14 —0.03 0.02 —0.01 —0.04 —0.02 —0.68 
0 0 —0.28 —0.06 0.04 —0.02 —0.07 —0.04 —0.57 
0 0 —0.30 0.21 —0.08 —0.34 —0.18 —0.31 |. (12.71) 
0 0 0 —0.28 0 —0.61 —0.11 
0 0 0 0 —0.58 0 —0.42 
0 0 0 0 0 —0.91 —0.09 


oooco 
ooo 


Almost all the nonzero elements are negative, because increasing any of the 
rates leading towards clinical CRC reduces life expectancy, as does increasing the 
mortality rate due to other causes of death. The exceptions are the sensitivities and 
elasticities of 7, to 45 (in column 5 of these matrices), which are positive because 
às delays the onset of clinical CRC (cf. Fig. 12.1). 

The elasticities of E (n1), the life expectancy of a normal individual, to a change 
in 0, appear in the first row of (12.71). The largest of these (except for the last 
column, representing mortality from other causes of death) are to changes in 
41, A2, and 43, the rates of transition from normal to small adenoma, small to 
large adenoma, and large adenoma to preclinical CRC. The rates à2 and A3 have 
large effects on E(72), and 43 has a large effect on E(73). These transitions are 
targets of screening and early treatment; this analysis quantifies the effect that such 
interventions could have. 

The sensitivity and elasticity of the standard deviation of longevity are 


0.27 —0.07 —0.16 —0.00 0.00 —0.00 —0.03 —0.00 —1.19 
0 —0.13 —0.31 —0.00 0.00 —0.00 —0.06 —0.00 —0.76 


nes 0 0 —0.43 —0.00 0.00 —0.00 —0.09 —0.00 —0.61 
= 0 0 0 -0.01 0.01 0.00 —0.27 0.00 —0.27 | x 10 
do 0 0 0 0 0 —0 0.00—0.02 —0.02 
0 0 0 0 0 0-037 0-037 
0 0 0 0 0 0  0—0.02—0.02 
(12.72) 
and 
0.11 —0.06 —0.09 —0.02 0.01 —0.00 —0.02 —0.01 —0.70 
0 —0.15 —0.22 —0.04 0.03 —0.00 —0.06 —0.01 —0.55 
ara O 0 —0.36 —0.05 0.05 —0.00 —0.11 —0.01 —0.52 
W 0 0  0—0.230.23 0.01 —0.58 0.00-0.43 |. (12.73) 
el 0 0 0 0 0-0.16 0.00 —0.75 —0.09 
0 0 0 0 0 0-058 00.42 
0 0 0 0 0 0 0-0,91 —0.09 


These have the same sign pattern as the sensitivity of 9), indicating that any 
increase in life expectancy will be accompanied by an increase in the variance 
of longevity. The coefficient of variation takes this joint change into account; 
from (12.39), 
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0.04 0.02 0.03 0.00 —0.00 —0.00 0.01 —0.00 —0.31 
0 —0.00 0.02 —0.01 0.00 —0.01 0.01 —0.01 —0.38 
0 0 —0.01 —0.03 0.01 —0.02 0.01 —0.04 —0.21 

ECV (n) 

y =] 9.00 0.00 0.00 —0.00 —0.07 —0.08 0.32 —0.14 0.19 |. (12.74) 

ae 0 0 0 0.00 0.00 —0.30 0.00 —0.27 —0.09 
0 0 0 0 0 0.00 0.00 0.00 0.00 
0 0 0 0 0 0 0 0.00 0.00 


Most of these elasticities are small, suggesting that the mean and standard 
deviation respond roughly proportionally, so that the CV does not change much. 

The matrix B in (12.48), giving the ultimate probability of death from CRC (row 
1) or other causes of death (row 2) is 


(12.75) 


g — (0-1 0.2 0.4 0.7 0.9 0.6 0.9 
~ 10.9 0.8 0.6 0.3 0.1 0.4 0.1 J7 


Focusing on the probability of death due to CRC, the sensitivity and elasticity, 
from (12.50), are 


3.5 1.0 2.1 0.0 —0.0 0.0 0.40.0 —7.1 
02.5 5.20.1 —0.10.0 0.90.1 —11.5 
0 0840.1 —0.10.0 1.50.1 —12.5 


dvec B(1, :) 
=| 9 0 002-0201 3002 -8.5 
0 0 0 0 00.10.0004 —5.4 
0000 0 081 0-111 
0000 00 004 -39 
0.6 0.40.5 0.1 —0.1 0.00.1 0.1 —1.7 
0.0.4 0.5 0.1 —0.1 0.00.1 0.1 —1.2 
ens 0 00.501 —0.1 0.00.1 0.1 —0.8 
T =] 0 0 001-01 0.00.1 0.0 -03 
<6 0 0 0 0 000 001-01 
0000 0 004 0-04 
0000 0 0 001-01 


The probability of death from CRC could be reduced by increasing the mortality 
rate due to other causes (last column), although this is not an attractive treatment 
option. A more useful interpretation of the last column is as an indication of the 
increase in death from CRC that would result from reducing other causes of death. 

For normal individuals, the risk of death from CRC is most elastic to changes in 
A2, A3, and A4 (row 1). The row sums of the elasticity matrix, corresponding to the 
effects of a proportional change in all rates, sum to zero because a change of time 
scale does not affect the probability of absorption. 
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12.6.2 Sensitivity of the Embedded Chain 


The transition matrix P in (12.54) for the embedded chain is 


0 0 0 0 0 0 
041 0 0 0 0 0 

0061 0 0 0 0 

0 0049 0 0 0 

P= 0 0 0059 0 0 
0 0 0038 0 0 


0 0 0 00.96 0 


000 
000 
000 
000 
000 
000 
000 


0 0 0 O 00.580.9100 
0.59 0.39 0.51 0.03 0.04 0.42 0.09 0 0 


The fundamental matrix Ñ; from (12.57) is 


10 0 0 0 0 
0410 0 0 0 
0.20.61.0 0 0 
0.1 0.3 0.51.0 0 
0.1 0.2 0.3 0.6 1.0 
0.1 0.1 0.20.4 0 
0.1 0.2 0.3 0.6 1.0 


2) 
I 


oooco 


0 
1.0 


0 
0 
0 
0 
0 
0 
0 1.0 


(12.76) 


(12.77) 


In this continuous-time chain, states cannot be re-entered (cf. Fig. 12.1). Because a 
state can be visited at most once, the mean number of visits is also the probability 
of ever entering the state. Thus the probabilities that a normal individual will 
ever suffer early or late clinical CRC are ÑN: (6, 1) = 0.1, and Ni, 1) = 0.07, 
respectively. These probabilities increase for individuals in successively later stages; 
for an individual with large adenoma the probabilities are ÑN: (6.3) = 0.2 and 


N, (7, 3) = 0.3, respectively. 


Focusing sensitivity analysis on individuals in the normal state (state 1), the 
sensitivities and elasticities of the number of visits are 


0 0 0 0 0 

15.9 0 0 0 0 

dN (:, 1) 9.72.8 0 0 0 

—;~— =] 481429 0 0 
d0 


2.8 0.8 1.7 0.1 —0.1 
1.8 0.5 1.1 —0.1 0.1 


000 0 
000 —11.0 
000-I11.1 
000 —8.3 
000 —5.0 
000 —3.2 


2.7 0.8 1.6 0.1 —0.1 0.000 —4.9 


(12.78) 
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and 
0 0 0 0 0 000 0 
06 0 0 0 0 000-—0.6 
EN 1) 0.60.4 0 0 0 000-1.0 
-a 0.6 0.4 0.5 0 0 000-15]. (12.79) 
€ 


0.60405 04-04 000-1.5 
0.6 0.4 0.5 —0.6 0.6 000-1.5 
0.6 0.4 0.5 0.41 —0.4 0.04 0 0 —1.5 


The sensitivities and elasticities of the probability of contracting clinical CRC are 
given by the last two rows. These probabilities are highly elastic to A;, Az and à3. 
The elasticities to u indicate that every 1% reduction in mortality due to other causes 
will cause about a 1.5% increase in the probability of experiencing clinical CRC. 


12.7 Discussion 


The results of this chapter have been presented in terms of differentials of, or 
derivatives with respect to, a general vector 0 of parameters. The nature of these 
parameters and their relation to Q, U, or M can be very general. At its simplest, 
0 could consist of some subset of the elements of Q. This is the case in the 
CRC example (Sect. 12.6), in which the parameters are transition rates À; and 
mortality rates u;. More generally, the transition rates might themselves be written 
as functions of other variables. For example, in Van Den Hout and Matthews 


(2009a,b) the rates are written as qij = exp (87,2). i Æ j, where z is a vector of 


covariates (e.g., age, medical care) and B;; is a vector of coefficients to be estimated. 
The results presented here can be applied directly to such cases, and indeed to even 
more complicated functional dependencies, using the chain rule. Thus, focusing 
on parametric dependence is not only scientifically valuable (these are, after all, 
the relationships of interest in applications of Markov chains) but also extremely 
general. 

Epidemic models are often written as continuous-time Markov chains, specified 
in terms of rates of movement among infection states. G6mez-Corral and López- 
Garcia (2018) extended the methods of this chapter to a model in which individuals 
are classified by two state variables (a level-dependent quasi-birth-death process). 
The model may be considered a continuous-time analog of the agex stage models of 
Chap. 6 (Caswell 2012; Caswell and Salguero-Gémez 2013; Caswell et al. 2018). 
Their approach takes advantage of the block structure of the intensity matrix for such 
processes. They have also applied the approach to receptor-ligand complexes within 
cells (L6pez-Garcfa et al. 2018). As far removed from demography as molecules 
may seem, the concepts of i-state transitions, of inferring population behavior from 
individual trajectories, and of sensitivity analysis still apply. That’s a good thing. 
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